The early Drosophila embryo is a model for the study of transcriptional control of development. A pre-requisite for the development of the Drosophila embryo is a precise and coordinated control of gene expression, both spatially and temporally. This process of complex transcription regulation is thought to be achieved by the combinatorial action of multiple transcription factors binding to modular units of cis-regulatory DNA sequences. The transcription factors Bicoid (BCD), Caudal (CAD), Hunchback (HB), Kruppel (KR), and Knirps (KNI) are crucial in patterning the anterior-posterior axis of the embryo by acting at very early stages of Drosophila development.
In prior studies, functional tests of 37 predicted targets of the five above-mentioned motifs have been completed. A positive training set of 15 sequences and a negative training set of 18 sequences have been constructed for embryonic enhancer prediction.
Clustering of transcription factor binding sites is the traditional approach for cis-regulatory element prediction, but several drawbacks exist.
In contrast to popular clustering approaches, my proposed method utilizes paired motifs to identify enhancers from non-functional elements. Application of this paired motifs approach achieves a genome-wide prediction with high specificity (94 percent) and sensitivity (60 percent). Paired motif prediction performs better than single motif prediction when considering motif weight, separation of positive/negative training sets and the total number of predicted enhancers.
The availability of Gene Ontology information improves genome-wide prediction by enabling the use of a subset prediction method, restricting the search to regions flanking embryonic genes; more candidates are included while still maintaining a good specificity.
The genomes of multiple Drosophila species provide an excellent model for comparative analysis. I present a dynamic search approach, which is unbiased in terms of sequence conservation, and has the potential to find non-conserved enhancers.
In total I predicted 135 enhancers in D. melanogaster including 37 novel and 27 known enhancers. 71 enhancers of my prediction overlap with experimentally verified binding regions but not with characterized known enhancers; they are likely to be functional elements and good candidates for experimental validation.
Additionally, I confirmed that enhancer elements are indeed subject to fast evolutionary changes. First, the number of enhancers varies widely across Drosophila species. Second, the positions of embryonic enhancers are independent of sequence conservation. Third, motif re-arrangement in homologous enhancers is rather frequent and rapid. Fourth, enhancer gain and loss analysis shows that 9 enhancers have been gained in D. melanogaster during evolution. From this observation I speculate that embryonic enhancers can originate from non-functional sequence.
This prediction method has been proven to work on embryonic enhancers as well as PRE/TREs, hence exploring its effectiveness outside the embryonic domain or even in different species is worthy of further research. The prediction results are available online http://bibiserv.techfak.uni-bielefeld.de/jpred_en.