The field of forensic DNA typing has advanced rapidly over the past decades. Nowadays, short
tandem repeats (STRs) are the markers of choice to identify the donor of biological evidence. This class of genetic variation consists of tandem-like repeated elements that are highly variable in both the number of repeat units and the repeat sequence. Analysis of length polymorphisms at STR loci currently almost exclusively relies on PCR amplification and subsequent fragment sizing using capillary electrophoresis – a “gold standard” with certain limitations as to the resolution of STR alleles and the separation of artificial products. With the ongoing advancement of DNA sequencing, the forensic community is exploring the opportunities of massively parallel sequencing (MPS) for high-resolution forensic DNA typing. MPS enables characterizing biological evidence in previously unimagined detail. Some attractive features of STR sequencing are the increased discrimination power compared to that of electrophoretic sizing and the ability to investigate a wide range of forensic markers in a single assay. Despite these benefits, the application of MPS to routine casework poses new challenges, which are addressed as part of this dissertation.
The monSTR identity panel was designed in response to demand for a medium-sized STR assay
on the Illumina MiSeq platform, targeting 21 forensically important markers including the highly discriminative SE33 locus. This thematic complex describes the construction of a custom forensic MPS-STR assay from primer engineering through the optimization of thermocycling conditions. The Design of Experiments methodology pioneered in this context enables an experimentally practical and economically justifiable assay optimization. Statistical modeling revealed valuable insights that helped to understand the characteristics of the monSTR assay. Joint optimization of multiple process parameters resulted in a high-fidelity identity panel, characterized by a wellbalanced amplification of STR loci, a high on-target ratio of sequence reads, and reduced formation of stutter products compared to standard PCR conditions. Developmental validation studies according to established forensic guidelines have explored the capabilities and limitations of this novel identity panel. One of the key findings was that monSTR generates complete and reproducible genotypes even with minute amounts of input DNA. Results have also demonstrated that STR alleles of multiple contributors in imbalanced mixed samples can be accurately resolved.
The bioinformatics analysis of STR sequencing data represents one of the main bottlenecks for
the integration of MPS into standard casework laboratories. The present thesis introduces a novel open-access web application, toaSTR, that translates raw sequencing data into genetic profiles. The software engineering chapter provides insights into bioinformatics algorithms and the composition of application components. A novel stutter model proposed herein predicts and identifies artificial products originating from the analytical scheme. Sequence observations are automatically classified in order to assist in the interpretation of complex samples. Evidence from multiple studies has shown that toaSTR can precisely identify alleles from data obtained with various MPS platforms and identity panels. By emphasizing usability and versatility, toaSTR simplifies access to MPS data analysis for DNA laboratories without in-depth bioinformatics knowledge.