In this thesis, we present new efficient index-based algorithms for searching with position specific scoring matrices (PSSMs for short), a well known motif model, in large sequence sets, and their integration into an interactive system capable for large-scale differential comparative genome analyses. The newly developed and implemented index-based algorithms for searching with PSSMs clearly outperform existing methods in terms of running time. We also demonstrate how index based PSSM searching in combination with a fragment chaining approach can be used for efficient protein family classification, and for speeding up computation intensive database searching with hidden Markov models. With the PoSSuM software distribution, we also provide implementations of the presented algorithms in form of a flexible command line tool.
We further integrated our newly developed algorithm possumsearch as a database search method in our integrated high-throughput sequence analysis system GENLIGHT, which is also a contribution of this work. GENLIGHT offers an interactive, biologist compatible, and user friendly environment for a variety of large-scale sequence analysis tasks with a special focus on (differential) comparative genome analyses. It employs a set oriented operational model, that allows to reuse generated results, and to perform complete analysis workflows in an interactive way. The system integrates several widely used sequence analysis methods and databases in a common environment, and is capable to perform analyses on a complete genome or proteome scale by employing a distributed client server approach, even for non index-based analysis methods. We demonstrate the practical usability of GENLIGHT with different case studies in which the system was used and which lead to substantial new scientific findings.