With the accumulation of gene and protein sequence data in publicly available databases and the development of computational methods for their comparison, sequence analysis has become an extremely powerful tool to uncover functional properties of these molecules. In general, however, the biological function is a result of many interacting molecules forming large interaction networks such as regulatory networks or metabolic pathways. To improve our understanding of the phenotype of organisms it is of great value to analyze not only individual genes, but also these interaction networks. In particular the growing amount of publicly available data on metabolic pathways as well as of functional annotation data for sequenced organisms enables the comparison of organisms based on their metabolic reaction networks on a large scale.
In this thesis a fully automated approach for comparative analysis of organisms on the functional level of metabolism is developed that yields a classification of the analyzed organisms according to their individual metabolic pathway variants. In contrast to gene sequence-based comparison techniques, this approach is based on the functional annotation of genes, namely metabolic reactions. Moreover, instead of comparing individual reactions one at a time, metabolic pathways are compared, which are sets of reactions that are jointly involved in the same cellular process.
As an application example, five Corynebacteria are compared against each other using the newly developed approach and the results are discussed in light of their biological relevance. This example demonstrates the benefit of the developed approach for improving knowledge on habitat and lifestyle of organisms and the respective metabolic prerequisites, for detecting potential drug targets, as well as for improving the functional annotation of the respective genomes.
A web frontend called CPA (Comparative Pathway Analyzer), which is available at http://cpa.cebitec.uni-bielefeld.de, can be used free of charge to apply the developed approach on genome and pathway data from the KEGG database or on data provided by the user.