The analysis of RNA secondary structure has become more and more important throughout the last decades after it was recognised that RNA does not only serve as a passive messenger (mRNA), but also as a functional compound of the cell. Furthermore, it was elucidated that mainly the structure rather than the sequence determines the function of such non-protein-coding RNA. This means that two RNA molecules which have low sequence similarity but high structure similarity are likely to have a similar function.
The prediction of RNA secondary structure is based on parameters that have been measured in vitro. This results in rather static parameters, that do not incorporate the dynamic change of environment occurring in living organisms. Nevertheless, the use of these parameters, that are summarised in the energy model, gave valuable results, especially for short sequences. Several refinements throughout the years improved the predictions, but still the calculated optimal structure is not guaranteed to correspond to the native one. In this case, and due to the fact that the native structure is feasible under the energy model, it is common practice to additionally calculate suboptimal structures and incorporate these in the study. The set of all suboptimal structures is referred to as the structure space, which actually holds the information needed to answer questions such as: Is the optimal structure also the native one? Are there more than one structure an RNA molecule can adopt? How well-defined is the optimal structure?
Major problems in the analysis of the structure space are its size and its shape. The number of suboptimal structures is exponential in the sequence length, which means that for sequences of moderate length the size quickly exceeds several billion. Besides the size, the appearance of the structure space complicates its study. The structure space can be imagined as a rough landscape with valleys, holding local optimal structures, separated by mountains and saddles. This landscape is not smooth but cliffy and complex, which prevents the development of a practical and still intuitive visualisation.
In general, the intention of structure space analysis is not its visualisation, but its complexity also hampers approaches to derive specific features hidden in the structure space. Despite these problems, several tools exist that analyse the complete structure space or at least a part of it to answer the aforementioned questions. Among these are MFOLD which produces a subset of all possible structures according to a threshold of structural similarity, SFOLD which samples the structures in a probabilistic fashion and provides a method to identify alternating structures, RNAsubopt to produce all suboptimal structures within a given energy threshold, barriers to identify valleys, mountains and saddles of the structure landscape, and others.
My contribution to this area of research is twofold: First, I present paRNAss (prediction of alternating RNA secondary structures) which focuses on the detection of conformational switches and analyses the structure space based on pairwise comparisons. paRNAss has been available since 1997 and I could improve its predictive power as well as its speed which made possible a systematic evaluation. During this evaluation it turned out that paRNAss can even be used to identify more than two competing structures and hence get a deeper insight into the structure space. The second tool I introduce is RNAshapes which facilitates different kinds of analyses. The algorithm makes use of abstract representations of the secondary structure to compute only those that are morphologically dissimilar, i.e. are composed of different structural elements. Structures being morphologically similar are pooled in a class of structures and each class is represented by its best member. The list of these representatives gives a general overview of what is there in the structure space. In addition to this, I introduce an algorithm to compute probabilities of the aforementioned classes of structures. This gives hints to properties such as alternating secondary structures (two classes with similar probabilities) and structural well-definedness (one class with very high probability).