Advancements and Challenges in Protein Engineering: Unveiling Nuances, Machine Learning Endeavors, and Applications in Predictive Tools
- Huang, Peishan
- Advisor(s): Siegel, Justin B
Abstract
Chapter 1. Evaluating Protein Engineering Thermostability Prediction Tools Using an Independently Generated Dataset Engineering proteins to enhance thermal stability is a widely utilized approach for creating industrially relevant biocatalysts. The development of new experimental datasets and computational tools to guide these engineering efforts remains an active area of research. Thus, to complement the previously reported measures of T50 and kinetic constants, we are reporting an expansion of our previously published dataset of mutants forβ-glucosidase to include both measures of TM and ΔΔG. For a set of 51mutants, we found thatT50andTMare moderately correlated, with a Pearson correlation coefficient and Spearman’s rank coefficient of 0.58 and 0.47, respectively, indicating that the two methods capture different physical features. The performance of predicted stability using nine computational tools was also evaluated on the dataset of 51 mutants, none of which are found to be strong predictors of the observed changes inT50, TM, or ΔΔG. Furthermore, the ability of the nine algorithms to predict the production of isolatable soluble protein was examined, which revealed that RosettaΔΔG, FoldX, DeepDDG, PoPMuSiC, and SDM were capable of predicting if a mutant could be produced and isolated as a soluble protein. These results further highlight the need for new algorithms for predicting modest, yet important, changes in thermal stability as well as a new utility for current algorithms for prescreening designs for the production of mutants that maintain fold and soluble production properties.
Chapter 2. Construction and biophysical evaluation of 277 β-glucosidase mutants to elucidate structure, and function relationships Enzyme engineering revolutionizes biotech, pharmaceuticals, and therapeutics by improving stability and catalytic efficiency. Computational tools accelerate this, but quantitative predictions remain challenging. We expanded a Carlin mutant dataset to 277 diverse β-glucosidase B samples, uncovering independent engineering of thermostability and kinetics. Benchmarking revealed Rosetta and amino acid features predict expression, and Evolutionary Scale Modeling (ESM) predicts enzyme turnover rate. However, for more robust machine learning predictions, it's essential to emphasize the need for ongoing dataset expansion to enhance quantitative predictions and advance protein engineering tools.
Chapter 3. Computational design of human N-acetylglucosaminyltransferases hGnT-I mutants with improved solubility and expression in E. coliThis study aimed to enhance the soluble expression and stability of critical enzyme, human N-acetylglucosaminyltransferases GnT-1 (hGnT-I), which are involved in processing glycoprotein N-glycans. N-glycosylation is a vital quality attribute of glycoprotein and glycopeptide therapeutics, affecting their solubility, stability, safety, efficacy, and immunogenicity. However, producing these enzymes from mammalian sources in E. coli expression systems is challenging due to their complex post-translational modifications and folding. To overcome this hurdle, we explored the use of truncation and PROSS (Protein Repair One-Stop Shop), a computational approach that uses evolutionary information to suggest mutations that improve the soluble expression and stability of hGnT-1. Our designs successfully improved stability by enhancing molecular interactions and packing, leading to increased expression and activity. This study offers a novel approach to enhance GTs' expression and stability, providing a valuable resource for glycoprotein engineering and modification.