Various methods have been proposed for RNA secondary structure comparison, and new ones are still being developed. It seems that there is no appropriate distance measure for structure comparison yet.
Such a distance measure should appropriately capture the features of the secondary structure to determine the distance. Thus, the measure also depends on the representation of the secondary structure, which may introduce artefacts in the distance computation. Our goal is to find a distance function that avoids artefacts caused by the representation, and is based on a reasonable representation of the secondary structure. After a discussion of common distance functions for RNA secondary structures, we focus on the forest alignment distance, which represents the secondary structures in a natural way with regard to the nesting and adjacency relation of substructures.
In the main part of this work, we extend the gap model of the forest alignment distance to make it suitable for affine gap costs. This leads to a new algorithm variant, which is explained in this thesis, and is implemented in the new version of the tool RNAforester 2.0.
In addition, we provide a mechanism to speed up the alignment process by anchoring of subalignments. The anchoring information is based on the overall shape of the molecule, and is obtained by the method of abstract shape analysis.
Another contribution is the discussion of the well-formed RNA forest alignment concept. I adapt the case distinction in the recurrences that were designed to construct well-formed RNA forest alignment, to ensure that the deletion and insertion of a pairing relation between two bases is handled in an appropriate way.
The affine gap scoring scheme brings an additional constant factor of ≈ 7 into the computation. It improves structure alignments in many cases, if combined with the right scoring parameters. The anchoring of subalignments leads to an average speedup of factor ≈ 3 compared to the usual computation, dependent on number and placement of the anchors.