Cells in the human body communicate over long distances via two systems, the humoral system and the neuronal system. The humoral system works via first messenger substances, such as hormones, cytokines and neurotransmitters, which are released into the blood. Biomedical knowledge on this kind of intercellular signaling is well established, but in contrast to signaling processes inside cells, not much of this knowledge exists in a form that is easily accessible for automated approaches, such as databases or ontologies. Most of what is known about extracellular signaling is stored in terms of natural language text in the scientific literature.
The present study aims at the reconstruction and analysis of cell-cell signaling pathways by applying automated approaches. Therefore, relevant data is extracted from molecular databases as well as from biomedical literature by applying concept based text mining. For this purpose, models and corresponding graph representations are developed to assemble intercellular signals from partial information since available data sources are scattered and incomplete. The resulting information is finally applied to generate hypotheses on cell-cell signaling in the context of neurodegenerative diseases.
More specifically, from the few molecular databases containing appropriate data, one database is tested in a preliminary study and reconstruction approaches accessing the specific structure of this database are developed. To reconstruct information from natural language text, ONDEX, a framework for ONtological text inDEXing and data integration has been developed in a collaborative work. ONDEX supports concept based approaches, i.e. databases and ontologies are integrated into a standardized graph-based framework, where biological entities as concepts are linked by relations (i.e., "is-a", "part-of" or "synonym"). A major part of this thesis is the development and the integration of concept based text indexing and concept based co-occurrence searches into ONDEX. On this basis, MEDLINE abstracts are mapped to concepts of a number of ontologies (e.g., Gene Ontology, MeSH terms and Cell Ontology) and mined for relevant parts of intercellular signaling. From these relations finally, cell-cell signaling hypotheses are assembled.
Whereas the networks resulting from the database reconstruction are not sufficient for reasonable analysis and further use, evaluations of the text mining results show that a significant number of known facts can be found by applying concept based co-occurrences searches. Finally, the text extraction results are reduced to a manageable amount of concept based co-occurrence hits and hypotheses for cell types involved in neurodegenerative diseases. In this case a number of known facts are reconstructed and suggestions for further improvements are made.
The text extraction results demonstrate the possibility to reconstruct relations between biological entities from text by applying a concept based framework and thus, how a large text set can be reduced to a number of hypotheses allowing manual examination.