Ensemble approach combining multiple methods improves human transcription start site prediction

Titelaufnahme

Titel
Ensemble approach combining multiple methods improves human transcription start site prediction
Verfasser
Dineen, David G. ; Schroeder, Markus ; Higgins, Desmond G. ; Cunningham, Padraig
Enthalten in
BMC Genomics, Jg. 11 H. 1
Erschienen
2010
Sprache
Englisch
Dokumenttyp
Aufsatz in einer Zeitschrift
ISSN
1471-2164
URN
urn:nbn:de:0070-pub-20035110
DOI
10.1186/1471-2164-11-677

Zugriffsbeschränkung

Links

Dateien

Klassifikation

Abstract

Background: The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different features and training sets, along with a variety of machine learning techniques and result in different prediction sets. Results: We demonstrate the heterogeneity of current prediction sets, and take advantage of this heterogeneity to construct a two-level classifier ('Profisi Ensemble') using predictions from 7 programs, along with 2 other data sources. Support vector machines using 'full' and 'reduced' data sets are combined in an either/or approach. We achieve a 14% increase in performance over the current state-of-the-art, as benchmarked by a third-party tool. Conclusions: Supervised learning methods are a useful way to combine predictions from diverse sources.

Inhalt

Statistik