Medicago truncatula is a model plant for studying legume biology. The ability to interact with beneficial microbial organisms leading to the formation of nitrogen fixing root nodules and to phosphate-acquiring arbuscular mycorriza (AM) is one of the main distinctive features of this family of plants. The two different symbioses of Medicago truncatula are investigated by various international research projects.
Oligonucleotide microarrays are a robust technique to examine the expression of thousands of genes in parallel. Affymetrix GeneChips®, more recently designed gene-specific chips, make it easier for the researcher to compare and evaluate gene expression and thus will most certainly lead to more accurate results. Not surprisingly, Medicago GeneChips® are moving into the focus of gene expression analysis research in this model plant. Software applications for the analysis of GeneChips® are mostly commercial, or implemented as command-line tools without a user interface. Furthermore, a comparison to the analyses of previously performed oligonucleotide microarrays is difficult, as analysis pipelines and methods differ in each application. In the scope of this thesis EMMA2, an application for the analysis of oligonucleotide microarrays, was extended to load, store and analyze Affymetrix GeneChips® as compareable as possible to oligonucleotide datasets.
Databases for either sequence, annotation, or microarray experiment datasets are extremely beneficial to the research community, as they centrally gather information from experiments performed by different scientists. However, datasets from different sources develop their full capacities only when combined. The idea of a data warehouse directly addresses this problem and solves it by integrating all required data into one single database hence there are already many data warehouses available to genetics. For the model legume Medicago truncatula there was no such single data warehouse that integrated all freely available gene sequences, the corresponding gene expression data, and annotation information.
The TRUNCATULIX data warehouse is created in the scope of this thesis to store Medicago truncatula sequence, annotation, and expression datasets and offer these to the legume community. Different filtersteps allow a precise query for genes and expression values in a database of over 200.000 gene sequences and over 200 hybridizations. For the first time users can now quickly search for specific genes and gene expression datasets in a huge database based on high-quality annotations. The results can be exported as Excel, HTML, or as csv files for further usage.
A multitude of EST and microarray experiments are conducted for Medicago truncatula covering different tissues, cell states, and cell types. Under these circumstances the challenge arises to integrate the results of the different expression analysis methods with the goal to discover novel information from the combined datasets. The application MediPlEx is designed to allow an integrated expression analysis for the Medicago truncatula datasets stored in SAMS and in the TRUNCATULIX data warehouse. After selecting genes of interest by their expression conditions, expression profiles are combined for a hierarchical clustering. The results are presented in a table, as a cluster dendrogram, and in an interactive 3D application.
The three parts of the thesis have been published by Dondrup et al. (2009), Henckel et al. (2009), or are submitted (Henckel et al. (2010)).