de
en
Schliessen
Detailsuche
Bibliotheken
Projekt
Impressum
Datenschutz
zum Inhalt
Detailsuche
Schnellsuche:
OK
Ergebnisliste
Titel
Titel
Inhalt
Inhalt
Seite
Seite
Im Dokument suchen
Timm, Wiebke: Peak intensity prediction in mass spectra using machine learning methods. 2008
Inhalt
1 Introduction
1.1 Outline
1.2 Publications
2 Proteomics and mass spectrometry
2.1 Proteins and peptides: an overview
2.1.1 Post-translational modifications
2.2 Inside the machine -- mass spectrometry
2.2.1 Ion source
2.2.2 Mass analyzers
2.2.3 Detector
2.2.4 Tandem MS (MS/MS)
2.3 Protein separation techniques
2.4 LC coupled to ESI or MALDI -- principle and differences
2.5 Computational methods for proteomics
2.5.1 Identification of proteins with MS --- qualitative proteomics
2.5.2 Quantitative MS
2.5.3 Peak intensities
2.5.4 Relative and absolute quantification
3 Machine learning - methods and validation
3.1 Supervised learning
3.2 Linear regression
3.2.1 Properties
3.2.2 Implementations
3.3 Support vector machines
3.3.1 SVM for regression
3.3.2 Properties of SVR
3.3.3 Further reading and implementations
3.4 Random forests
3.4.1 Further reading and implementations
3.5 Shrinkage methods
3.6 Feature subset selection
3.6.1 Forward stepwise selection
3.6.2 Shrinkage methods for feature selection
3.6.3 Implementations
3.7 The two-sample t-test
3.8 Model evaluation
3.8.1 Structural risk minimization
3.8.2 Cross-validation
3.8.3 The Bayesian and Akaike information criteria
3.9 Pitfalls in statistical learning
4 Scope of this work
4.1 Related work
4.2 MS setups relevant to this work
5 Modeling the mass spectrometry process
5.1 The aim of the model
5.2 Steps and techniques
5.3 Sources of noise and errors
5.4 Accuracy enhancement of absolute quantitation with predicted peak intensities
6 Data acquisition, processing, and analysis
6.1 MALDI-TOF data
6.1.1 Wet lab procedures
6.1.2 In silico preprocessing
6.1.3 Construction of data sets
6.1.4 Normalization for MALDI datasets
6.1.5 Statistical analysis
6.2 LC-ESI data
6.2.1 Wet lab procedures
6.2.2 In silico preprocessing
6.2.3 Dataset construction
6.2.4 Normalization for LC-ESI data
6.2.5 Statistical analysis
6.3 Summary
7 Peak intensity prediction
7.1 Representation of Peptides
7.1.1 Computer scientist's paradigm: Peptides are strings
7.1.2 Biochemist's paradigm: Peptides are molecules
7.2 Statistical properties of the feature spaces
7.2.1 Correlated features
7.2.2 Frequency of dimers and trimers
7.3 Prediction
7.3.1 Methods
7.3.2 MALDI dataset prediction results
7.3.3 Results for LC-ESI data
7.3.4 Summary
8 Feature selection
8.1 Methods
8.2 Integration of selected features from different methods
8.2.1 Evaluation of the feature selection comparison for MALDI datasets
8.3 Detailed results of the different methods
8.3.1 Forward stepwise selection
8.3.2 Random forests for feature importance assessment
8.3.3 L1-penalized methods for feature selection
Feature selection by least-angle regression
Feature selection by L1-penalized generalized linear models
8.3.4 t-test in the seq feature set
8.4 Summary
9 Extended analysis
9.1 Detailed results
9.1.1 Analysis of error behavior
9.1.2 Are the predicted intensities the true signal?
9.1.3 Duplicate peptides in training and test set
9.1.4 Linear model
9.1.5 Unlabeled data
10 Conclusion
A Additional information
A.1 Notations
A.2 Abbreviations used for dataset variants:
A.3 Implementation details
A.4 The official IUPAC amino acid codes.
A.5 Overview of prediction results obtained in this work
B Glossary