Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Finding structure in disorder: Evolutionary analyses of disordered proteins in Drosophila

Abstract

Living systems are governed by the interactions between large collections of atoms called macromolecules. The most important class of these macromolecules are proteins, which are the molecular machines that carry out a cell's processes. Though proteins are linear chains of simpler building blocks called amino acids, many proteins accomplish their functions by folding into well-defined three-dimensional structures. For many years scientists believed fixed structures were necessary for protein function, but by the early 2000s evidence had accumulated that segments with no fixed spatial relationship between their atoms are ubiquitous in proteins. Furthermore, because these intrinsically disordered regions (IDRs) are highly flexible and can therefore interact with diverse binding partners, they are essential for many cellular processes related to signaling and regulation.

Although our understanding of the structure and function of IDRs has grown significantly over the past two decades, predicting their functions from their sequences of amino acids remains a significant challenge. Because IDRs are structurally unconstrained, their sequences evolve rapidly and are therefore not amenable to traditional bioinformatics techniques which depend on the precise order of amino acids to make comparisons with known proteins. There is increasing evidence, though, that IDRs conserve distributed features such as their chemical composition or net charge, and a recent study clustered IDRs with similar patterns of conserved features into groups with distinct functions. This study, however, was restricted to IDRs in a set of yeast genomes, so it is unclear if these global relationships between conserved features and function are unique to yeast or a general property of IDR evolution. Thus, in this work I conduct a series of evolutionary analyses of IDRs in the genomes of 33 different species of fruit flies to detect patterns of conservation.

These comparisons, however, require the identification of IDRs with common ancestry which perform equivalent functions across many distinct organisms. Since the first genomes were sequenced in the late 1990s, researchers have developed techniques for identifying and aligning such proteins, called orthologs. While these methods are generally effective, they are conducted by automated computational pipelines and prone to errors when processing the highly divergent sequences that characterize many IDRs. The evolutionary relationships between the genomes of closely related species generally make such mistakes easier to identify, and fortunately over the past five years advances in DNA sequencing technology have yielded dramatic increases in the number of sequenced genomes in the Drosophila genus. However, because the existing methods for ortholog identification were designed for fewer or more distantly related genomes, they do not fully leverage such genomic redundancy to minimize errors.

Thus, in the first chapter, I develop a novel method for identifying orthologs which addresses this shortcoming and apply it to 33 Drosophila genomes to generate a set of aligned orthologs. In the second chapter, I then identify rapidly evolving IDRs in these alignments and analyse them with a variety of evolutionary models to dissect the forces driving their evolution and detect patterns of conservation. Finally, in the third chapter, I discuss several software tools and tutorials for fitting statistical models to data, which were created while pursuing the previous aims.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View