The non-conventional methylotrophic yeast *Pichia pastoris* has become a firmly established host for recombinant protein production in both the industry and academia. High product titers, an efficient secretory machinery and the ability to express complex proteins from bacterial to human origin have given *P. pastoris* an advantage over many other host systems. In recent years, its aptitude for foreign gene expression has also been applied in a rising number of metabolic engineering studies. However, scientists trying to create the *P. pastoris* strain optimal for their application are faced with a challenge. The high clonal variability results in clones from one transformation exhibiting wildly different expression levels, no detectable expression at all or altered growth behaviors. In consequence, a laborious screening process has to be applied to identify the desired strain from among hundreds or thousands of clones. Surprisingly, only few studies tried to analyze clonal variability in *P. pastoris* so far. Although the connections between gene dosage and product titers have been investigated thoroughly, the underlying causes and mechanisms of clonal variability remained unknown.<br />
In this project, we present the first systematic investigation into the clonal variability of *P. pastoris*, the discovered genetic events and their impact on both recombinant protein production and growth behavior. By applying well-established standard methods for *P. pastoris* experiments, we aimed to provide relevant results and insights for other scientists working with this yeast. A library of 845 strains, transformed with an easy to detect reporter protein, was characterized for classic properties including colony morphology, gene dosage and productivity. Thereby, we analyzed a significantly larger clone library than previous *P. pastoris* publications, exceeding their size ca. 20 to 100 fold. Based on the characterization data, 31 strains with very peculiar features were selected for whole genome sequencing. Enabled by a combination of characterization and genome sequencing data, we discovered novel connections between integration event and strain properties.<br />
A clear correlation between cassette-to-cassette orientation and productivity was found. Additionally, a surprising ratio between the different orientation forms suggested the existence of two competing integration mechanisms that excluded each other. We also observed a rather high occurrence of false-positive clones containing the same integration event. Our combinatorial approach enabled us to identify a surplus homologous sequence inside the expression cassette as the likely cause for this secondary integration event. The theory was validated by optimization of the expression cassette and subsequent elimination of the undesired integration event.
Besides productivity related effects, we also analyzed strains that displayed a marked change in their colony morphology. Multiple new non-canonical integration events were discovered in them. Off-target gene disruptions could be correlated with the change in colony morphology. Particularly, the relocation of the knock-out target to a different chromosome and the subsequent gene disruption provided important insights for genetic engineering studies. In a number of clones we found *E. coli* DNA from the plasmid host, which had co-integrated in fusion with the expression cassette. Moreover, qRT-PCR experiments confirmed the transcriptional activity of the *E. coli* genes in *P. pastoris*.
Strikingly, the clonal variability also resulted in the creation of a novel genetic tool for recombinant protein production in *P. pastoris*. In one strain with exceptionally good productivity features, the creation of a circular plasmid consisting of the expression cassette and mitochondrial DNA was found. We could validate its replicative capabilities and successfully applied it for transformation of both *P. pastoris* and *Saccharomyces cerevisiae*. In *P. pastoris*, newly created pMito clones exhibited a highly uniform expression level that significantly exceeded a reference strain with a single copy of the expression cassette in its genome by up to fourfold.<br />
Taken together, our project provides scientists working with *P. pastoris* with important references for studies both focused on recombinant protein production as well as genetic or metabolic engineering. Thereby, we aim to promote further development of this yeast and aid in the implementation of more complex genetic engineering strategies. Ways to reduce the frequency of low-producer strains enable streamlined screening procedures for high producer strains. Simultaneously, the documentation of off-target integration events helps to devise strategies that prevent their occurrence or highlight events that should be assayed for in constructed strains. Lastly, the novel episomal vector we discovered displayed great potential, especially for protein engineering studies in which a great number of different target variants need to be assayed.