In times of multi-resistant pathogenic bacteria, their detailed study is of utmost importance. Their comparative analysis can even aid the emerging field of personalized medicine by enabling optimized treatment depending on the presence of virulence factors and antibiotic resistances in the infection concerned. The weaknesses and functionality of these pathogenic bacteria can be investigated using modern computer science and novel sequencing technologies. One of these methods is the bioinformatics evaluation of high-throughput sequencing data.
A pathogenic bacterium posing severe health care issues is the ubiquitous Pseudomonas aeruginosa. It is involved in a wide range of infections mainly affecting the pulmonary or urinary tract, open wounds and burns. The prevalence of chronic obstructive pulmonary disease cases with P. aeruginosa in Germany alone is ~600,000 per year. Within the framework of this dissertation, computational comparative genomics experiments were conducted with a panel of 20 of the most abundant Pseudomonas aeruginosa strains. 15 of these strains were isolated from clinical cases, while the remaining 5 were strains without a known infection history isolated from the environment. This division was chosen to enable direct comparison of the pathogenic potential of clinical and environmental strains and identification of their possible characteristic differences.
When designing the bioinformatics experiments and searching for an efficient visualization and automatic analysis platform for read alignment (mapping) data, it became evident that no adequate solution was available that included all required functionalities. On these grounds, the decision was made to define two main subjects for this dissertation.
Besides the P. aeruginosa pan genome analysis, a novel read mapping visualization and analysis software was developed and published in the journal Bioinformatics. This software - ReadXplorer - is partly based upon a prototype, which was developed during a preceding master's thesis at the Center for Biotechnology of the Bielefeld University under the name VAMP. The software was developed into a comprehensive user-friendly platform augmented with several newly developed and implemented automatic bioinformatics read mapping analyses. Two examples of these are the transcription start site detection and the single nucleotide polymorphism detection. Moreover, new intuitive visualizations were added to the existent ones and existing visualizations were greatly enhanced. ReadXplorer is designed to support not only DNA-seq data as accrued in the P. aeruginosa experiments, but also any kind of standard read mapping data as obtained from RNA-seq or ChIP-seq experiments. The data management was designed to comply with the latest performance and efficiency needs emerging from the large next generation sequencing data sets. Finally, ReadXplorer was empowered to deal with eukaryotic read mapping data as well.
Amongst other software, ReadXplorer was then used to analyze different comparative genomics aspects of P. aeruginosa and to draw conclusions regarding the development of their pathogenicity. The list of conducted experiments includes phylogeny and gene set determination, analysis of regions of genomic plasticity and identification of single nucleotide polymorphisms. The achieved results were published in the journal Environmental Biology.