Novel Algorithms and Benchmarks for Computational Protein Design
- Author(s): Ollikainen, Noah
- Advisor(s): Kortemme, Tanja
- et al.
Computational protein design aims to predict protein sequences that will fold into a given three-dimensional structure and perform a desired function. Though significant accomplishments in computational protein design have been achieved in the past several years, including the design of novel enzymes and protein-protein interactions, the accuracy of computational protein design is relatively low and many designed sequences must be experimentally tested in order to obtain a successful design. Moreover, successful designs often require directed evolution to achieve catalytic activities or binding affinities similar to naturally occurring proteins. A major challenge in computational protein design that limits its accuracy is the inability to sufficiently sample protein sequence and conformational space at a high resolution. Sampling is difficult due to the combinatorially large number of possible protein sequences and the inherent flexibility of the protein backbone, which may change its conformation upon changes in sequence. To address the issue of sampling a large number of sequences, I developed a deterministic computational protein design algorithm that identifies all sequences within a given energy of the global minimum energy sequence. To identify an accurate method of modeling backbone flexibility, I created a benchmark that evaluates designed sequences based on their similarity to natural sequences with respect to amino acid covariation. Lastly, I developed a novel method of coupling side-chain and backbone sampling. I applied this method to re-designing enzyme substrate specificity and showed a substantial improvement in accuracy over previous computational protein design methods. Taken together, these results demonstrate the importance of modeling protein backbone flexibility and provide new tools to enable higher accuracy computational protein design.