Increasing the complexity of computational protein modeling methodologies for functional applications in biology
- Author(s): Barlow, Kyle Andrew
- Advisor(s): Kortemme, Tanja
- et al.
While the native states of proteins usually correspond to their free energy minimum, and can often be found with experimental techniques, predicting the folded native state of
a protein computationally remains a major challenge. This is partly due to the immense conformational space a single protein sequence could potentially fold into, a space that is even larger if the protein sequence is unknown, as in the case of design. In this thesis, I evaluate the performance of current state-of-the-art computational protein structure prediction and design methods (as implemented in the Rosetta macromolecular modeling software suite) on the following commonly encountered modeling problems: estimation of energetic effects of mutations (protein stability (∆∆G) and change in protein-protein interface binding energy post-mutation); (2) protein design predictions (native sequence recovery, evolutionary profile recovery, sequence covariation recovery, and prediction of recognition specificity); and (3)
protein structure prediction (loop modeling). I assemble curated benchmark data for each of these prediction problems that can be used for future evaluation of method performance on a common data set.
As the prior state-of-the-art methods for prediction of change in protein-protein interface binding energy post-mutation were not very effective for predicting mutations to side chains other than alanine, I created a new, more general Rosetta method for prediction of these cases. This “flex ddG” method generates and utilizes ensembles of diverse protein conformational states (generated with “backrub” sampling) to predict interface ∆∆G values. Flex ddG is effective for prediction of change in binding free energy post-mutation for mutations to all amino acids, including mutations to alanine, and is particularly effective (when compared to prior methods) for cases of small side chain to large side chain mutations. I show that the method succeeds in these cases due to increased sampling of diverse conformational states, as performance improves (to a threshold) as more diverse states are sampled.