Over the last decade biological research has changed completely. The reductionism approach of studying only a few biological entities at a time in the past is being replaced by the study of the biological system as a whole today. Systems Biology seeks to understand how complex biological systems work by looking at all parts of biological systems, how they interact with each other and form the complete whole. This process requires that existing biological knowledge (data) is made available to support on the one hand the analysis of experimental results and on the other hand the construction and enrichment of models for Systems Biology.
Effective integration of biological knowledge from databases scattered around the internet and other information resources (for example experimental data) is recognized as a pre-requisite for many aspects of Systems Biology research. However, systems for the integration of biological knowledge have to overcome several challenges. For example, biological data sources may contain similar or overlapping coverage and the user of such systems is faced with the challenge of generating a consensus data set or selecting the "best" data source. Furthermore, there are many technical challenges to data integration, like different access methods to databases, different data formats, different naming conventions and erroneous or missing data.
To address these challenges and enable effective integration of biological knowledge in support of Systems Biology research, the ONDEX system which is presented in this thesis was created. The ONDEX system provides an integrated view across biological data sources with the aim to enable the user to gain a better understanding of biology from integrated knowledge. ONDEX is supported by BBSRC (http://www.bbsrc.ac.uk/) as part of the System Approaches to Biological Research initiative (SABR) and is now mainly being developed at Rothamsted Research, Manchester University and Newcastle University. The first ONDEX prototype was developed at University of Bielefeld.
ONDEX uses a three steps approach to address the outlined challenges, namely import of data from data sources, identifying overlapping or similar data across different data sources and analysis of the resulting integrated datasets to reveal new biological insights. This thesis presents the most interesting aspects of the ONDEX system in more detail; all other parts of the ONDEX system will only be briefly described when appropriate.