de
en
Schliessen
Detailsuche
Bibliotheken
Projekt
Impressum
Datenschutz
zum Inhalt
Detailsuche
Schnellsuche:
OK
Ergebnisliste
Titel
Titel
Inhalt
Inhalt
Seite
Seite
Im Dokument suchen
Beckstette, Michael: Index-based algorithms for motif search and their integration in a system for differential genome analysis. 2007
Inhalt
Introduction
The continuing challenge of biosequence analyis
Structure of this thesis
Modeling concepts for sequence motifs and consensi
Basic definitions and nomenclature
Motifs, domains, and sequence families
Motif finding
Regular expressions as motif descriptors
Consensus strings
Prosite patterns: Regular expressions for protein family assignment
Position specific scoring matrices
From alignment blocks to PSSMs
Sequence weighting procedures
Basic PSSM construction principles
PSSMs based on odds ratios
Average score methods
Explicit log-odd score methods
Construction of amino acid PSSMs in the BLOCKS database
Wu's minimal risk scoring matrices
Construction of nucleotide PSSMs in the TRANSFAC database
Gribskov's profile model
Hidden Markov models
Foundations of hidden Markov model theory
Profile hidden Markov models
Profile HMM collections for sequence annotation and classification
Concluding remarks on sequence motif models
Fast algorithms for matching position specific scoring matrices
Introduction
Pattern matching with PSSMs
Improved running time through the usage of lookahead scoring
Permuted lookahead scoring
PSSM searching using suffix trees
Dorohonceanu's algorithm
PSSM searching using enhanced suffix arrays: The ESAsearch algorithm
Analysis
Further performance improvements via alphabet transformations
Reduced amino acid alphabets
A unifying view on SPsearch, LAsearch, and ESAsearch
Finding an appropriate threshold for PSSM searching
Probabilities and expectation values
Calculation of exact PSSM score distributions
Evaluation with dynamic programming
Restricted probability computation
Lazy evaluation of the permuted matrix
Threshold independent PSSM matching: The k-best algorithm
Implementation and computational results
PoSSuM software distribution
Discussion and concluding remarks
PSSM family models for sequence family classification
Increasing the expressiveness of PSSM-based database searches
Using multiple ordered PSSMs for sequence classification
PSSM family models
Computation of optimal PSSM chains
Integration of PSSM family models into PoSSuMsearch
Performance of PSSM family models for protein family classification
Employed data set and evaluation scenarios
Model construction and scoring
Performance evaluation and results
The significance of PSSM chain scores
Accelerating HMM based database searches with PSSM family models
Model specific trusted- and noise cutoffs
PSfamSearch: Search space reduction with PSSM family models
Evaluation and computational results
Cutoff calibration strategies
Discussion and concluding remarks on performed experiments
Comparison of pHMMs and PSSM family models
Genlight - a system for interactive, high-throughput, differential genome analysis
Motivation
Genome annotation systems: Related concepts with different focus
Requirement definitions and design goals
System architecture and implementation
Concepts and functionality
The set oriented concept
Operations on Seq-sets and Hit-sets
Integrated sequence analysis methods
Integrated protein domain and family databases
Supported protein classification schemes
Gene ontologies: a unifying vocabulary for cross database queries
User defined sequence databases
Asynchronous distributed execution of sequence analysis tasks
Database schema
The internal sequence identifier concept
The handiness of the set oriented concept
More complex queries using computed sequence attributes
Genlight as a data warehouse
The Genlight user interface
Genlight case studies
Detection and analysis of the Smh gene family in maize
Analysis of Xenopus laevis expressed sequence tag clusters
Identification of potential drug targets in Helicobacter pylori
Concluding remarks on Genlight
Potential future developments and system extensions
Conclusions and prospects
Concluding remarks
Prospects
Appendix
The 20 letter amino acid alphabet
PROSITE pattern entry
PoSSuMsearch command line interface: Quick reference
The PoSSuM software distribution
File formats
PoSSuMsearch
PoSSuMdist
PoSSuMfreqs
PSSM converters
Using the PoSSuM software distribution
Messages and warnings
Predefined Hit-set filters in the Genlight system
Bibliography