Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Taking an evolutionary approach to probe the protein energy landscape

No data is associated with this publication.
Abstract

The bulk of this dissertation consists of biophysical studies on two protein families to gain insights into how the primary amino acid sequence of a protein encodes its function and properties; particularly, its kinetic barrier and stability.

The sequence of a protein encodes its entire energy landscape, which includes all accessible conformations, their relative stabilities, and dynamics. Mutations in the sequence change the landscape, which drives evolution. These evolutionary sequence variations can be used to probe the code between protein sequence and energy landscape. This is usually approached by comparing extant homologs, proteins with similar structures and functions encoded by different sequences. Some have taken computational approaches based on larger sets of sequences, such as generating common ancestors using Ancestral Sequence Reconstruction (ASR) or generating a consensus protein where a sequence is created based on the most common amino acid at each position. Proteins created via ASR or consensus approaches are often found to be more stable than the extant homologs - questioning the difference between these two methods and suggesting that these approaches can serve as general methods to engineer thermostable protein variants. Recently, we used ASR to evaluate the evolutionary basis for the difference in thermodynamics between RNases H from a mesophilic and thermophilic organism. In this work, I have used this same family to compare the biophysical properties between the consensus RNase H, the evolutionary ancestors, and the extant homologs. I find that while the consensus protein is folded and active, it does not show properties of a well-folded protein with enhanced thermodynamic stability. Moreover, I show these properties are sensitive to the phylogenetic relationship of the set of input sequences used to design the consensus RNase H. These data suggest the consensus approach is not a general method to engineer hyperstable proteins and shows how the properties of the designed consensus proteins vary depending on the input sequence set.

Additionally, I use evolutionary information to study the protein alpha-lytic protease (aLP) and its extreme kinetic stability. aLP has an unusual energy landscape: it is trapped in the native conformation by a large kinetic barrier preventing unfolding. The native state of aLP is not its most thermodynamically stable state. In order to fold, aLP utilizes a large N-terminal pro region that functions as a chaperone. Once folded, the pro region is removed, and the native state does not unfold on a biologically relevant time scale. Without the pro region, aLP folds on the time scale of millennia. A phylogenetic search uncovers aLP homologs with a wide range of pro-region sizes, including some with no pro region at all. In a phylogenetic tree, these homologs cluster by pro-region size. This clustering is irrespective of whether the tree is built with or without the pro region included, suggesting an evolutionary origin: the protease sequence alone encodes the pro-region size. I see a correlation between pro-region size and the size of the kinetic barrier. Homologs studied without pro regions are thermodynamically stable, fold in time scales much faster than aLP, and retain the same fold as aLP. Key features thought to contribute to aLP’s extreme kinetic stability are lost in these homologs, further supporting their important role. This study highlights how sequence encodes more than the final structure of a protein.

Main Content

This item is under embargo until February 16, 2026.