The repeatitive pattern of genomic DNA reveals much about the structure and organization of genomes. REPuter is a bioinformatics tool that efficiently finds repeatitive substrings in large genomic sequences. The identification of repeats enables a wide range of biological interpretations. Here we describe how this single software is applied to many different biological problems: assembly check, localization of low copy repeats, identification of string uniqueness, matching cDNAs onto genomic sequences, and comparison of gene structure. The adaptation of REPuter to the era of comparative genomics is described considering its biologically meaningful improvements, generating the new visualization GenAlyzer.
The ability to compare different genomes enables researchers to look for conservation and functionality of regulatory regions. Considering that sequences containing vital information are under greater evolutionary pressure than sequences without function, the former are expected to be more conserved during evolution. This clearly holds for protein coding sequences and can also be exploited in the study of regulatory or other noncoding functional sequences. Using comparative genomics, these noncoding functional sequences can be identified as conserved noncoding sequences (CNSs). Today, the functionality of CNSs is determined by very elaborate and time consuming experimental methods. Depending on the background level of similarity between the organisms being compared, the amount of conserved noncoding sequences can be very large and almost impracticable to handle in the wet-lab. Making use of available tools for genome annotation and comparison, we have developed Connosseur, a Conserved Noncoding Sequences Repository Generator. It provides bioinformatics support for generating and screening the set of CNSs between two genomic sequences. Connosseur automates several tools in a computational cascade, finally returning a repository of CNSs and associated information. For the data storage, we used the relational database system PostgreSQL. Further analyses can be carried out upon selected pCNSs. This includes determining uniqueness, overrepresented words analysis and comparison to known functional CNSs. This approach allowed us to identify several potential CNSs between mouse and human genomic sequences. Overall, Connosseur provides a flexible and extensible basis for in-depth studies of pCNSs.