Measuring Biochemical Possibility Spaces in Evolutionary Engineering
- Author(s): Pressman, Abe Daniel
- Advisor(s): Chen, Irene;
- Dey, Siddharth
- et al.
At the molecular level, artificial selection—controlling the forces of evolution to improve or design new biochemical functions— makes up one of our strongest tools for finding better biocatalysts, pharmaceuticals, and biosensors, as well as for studying the history and process of evolution itself. But fully harnessing evolution requires knowledge of the shape and dynamics of complete evolutionary spaces. Prior to this work, very little research existed comparing the real dynamics of artificial selection to any of the theoretical work that has been written to support it. By updating the classical theory of simple selections towards an engineering focus, and combining this with direct observations of direct evolving populations, my work has shown the first mathematical descriptions of how whole populations evolve during the selection of novel biocatalysts.
This work seeks to address the analysis of evolutionary fitness and chemical activity spaces at several levels. First, we offer a broad-ranging theoretical approach to mapping the distribution of fitness effects in any system under driven selection. Through both simulations and recent experimental data, we show that it is possible to estimate the initial distribution of fitness for nearly any selected population. In addition to potential applications in automated gene engineering, this theoretical solution also makes it possible to approximate the overall distribution of any selectable chemical function across random molecular space, a necessary condition for theoretical optimization of nearly any in vitro selection.
Zooming in, we next develop tools to view an entire population of active catalysts and how it dynamically changes over the course of an entire selection. Working with a model selection for de novo RNA triphosphorylation catalysts, we develop a new high-throughput method to measure many active catalysts in parallel, building the first portrait of how tens of thousands of different functional molecules enrich or disappear over the course of an entire artificial selection. New heuristics for assessing the effectiveness of various activity- estimation methods allowed us to efficiently identify highly active ribozymes, as well as estimating catalytic activity without performing any additional experiments. We also present the first picture of non-ideality during a real selection, demonstrating that stochastic effects can be a powerful and quantifiable confounding factor on predicted selection dynamics. Finally, this analysis allows us to build the highest-resolution extant picture of a biocatalyst activity distribution, showing a catalytic activity that is log-normal, consistent with a mechanism for the emergence of activity as the product of many independent contributions.
Finally, we design our own model selection to investigate the evolution of a theoretical aminoacylase RNA whose existence may have been crucial to the origin of the genetic code. Using this system, we have developed techniques for Sequencing to determine Catalytic Activity Paired with Evolution (SCAPE), a comprehensive workflow that allows complete mapping of large, dynamic landscape of chemical activity. By measuring catalytic activity of millions of evolved biomolecules simultaneously, we pair kinetic variations with genetic sequence at single nucleotide resolution, building the first complete map of all evolutionary pathways to an engineered function from anywhere in genetic space. The resulting map contains approximately six orders of magnitude more data than any previously- measured landscape of catalytic data, and suggests features of genetic epistasis and evolutionary ruggedness may be remarkably consistent across many unrelated biocatalysts with similar function. Our methods and results suggest general applicability to more complicated systems, as a viable alternative to the heuristic methods typically used to evaluate molecular selections, as well as validating a suite of capable tools for quantifying and optimizing the emergence of a wide range of evolvable biocatalytic functions.