Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

A Diverse Set of Evolutionary Questions that have been Answered Using Completely Sequenced Genomes

Abstract

In the past decade or so, the availability of completely sequenced genomes and their annotations opened up previously unthinkable opportunities to explore evolutionary and functional aspects of organic life forms. The rate of deciphering new genomes shows no signs of slowing down. While a few years ago every self-respecting genomicist could name every single available genome, currently I will be hard pressed to name those that have been completed in the past year alone. Whether or not the race to sequence more, faster and cheaper will continue to revolutionize our understanding of biology is an open question largely irrelevant to this thesis. However, it is undeniably evident that in the past decade a new set of approaches, tools and knowhow have emerged in the fienld of computational genomics, which will continue to be used for many years to come. It so happened that in the past three years I have been interested in several different questions that forced me to utilize almost every aspect of comparative computational genomics. In the course of answering questions that tickled my curiosity I have created a database, performed an evolutionary analysis of a newly sequenced genome, worked with secondary and crystal protein and RNA structures, measured the rate and mode of selection in coding and noncoding sequences, made functional annotations of proteins, and in the course of doing so have added to our understanding of several important evolutionary questions. Thus, this thesis is more of a demonstration of my capabilities as a computational comparative genomicist rather than a comprehensive attempt to resolve some long-standing dispute in biology. The first part of this thesis deals with some aspects and examples of compensatory evolution in a framework of Compensatory Pathogenic Deviations. The second part is a collection of works where the primary concept is the use of negative selection to reveal functional and evolutionary novel aspects of genes and genomes. In Chapter 1, I describe a database of mitochondrial tRNA sequences and secondary structures from completely sequenced metazoan mitochondrial genomes. This database has been compiled mostly by hand, such that secondary structure predictions were matched with evolutionary conserved regions while eliminating annotation errors, resulting in an impressive 6060 curated tRNAs structure predictions. After its completion, but before publication, this database has been used to describe patterns of compensatory evolution in mt tRNAs, which is the topic of Chapter 2. Previously to my work, it has been thought that it is impossible to create an exclusively computational method for prediction of pathogenic mutations in human mt tRNAs. I have been able to show the contrary using two simple improvements. Firstly, I have shown that sequences of more closely related species are much better predictors of fitness impacts in orthologous human sites than distant species. Indeed, this is an intuitive concept but it has not been utilized in a predictive matter previously. Secondly, I have used patterns of compensatory evolution in tRNA stem structures, which also greatly increased the predictive power of pathogenicity in orthologous sites. It is not enough to look at sequence conservation of a site to claim functional conservation, since sites may evolve in quickly even while being under functional and selective constraint. Such rapid evolution is most easily reconciled with functional conservation under the framework of structural compensatory evolution. For example, in a tRNA the nucleotides forming a Watson-Crick pair in a stem structure may rapidly change between G-C pair and an A-T pair. The destruction of each pair may be deleterious; however, each site may be rapidly evolving. By keeping track of sites potentially evolving in a compensatory manner, I have been able to further improve my prediction of pathogenic mutations in mt tRNAs. In Chapter 3 I use the sequence of the g-crystallins from several mammalian species to study compensatory evolution in this gene. A disease-causing variant in one of the g-crystallins was found in other, healthy mammals. Such events are called a Compensatory Pathogenic Deviation (CPD), and are thought to be caused by structural compensations in the homologous proteins. In this case, using a correlation sequence analysis it was possible to identify a probable compensatory site in the g-crystallin. Curiously, on the crystal structure this site was in a 180 degrees symmetrical position to the site with the pathogenic substitution. This allowed us to conclude that crystallins are likely to be packed together such that individual proteins are assembled in strings with alternating 180 degree rotations. In addition, these genes showed interesting patterns of gene conversion. Two g-crystallin pseudogenes showed clear signs of negative selection despite clearly being pseudogenes. This observation, coupled with signs of gene conversion in this gene family, led to the conclusion that gene conversion can lead to apparent selection in cases where the rate of conversion is rapid. Chapter 4 is entirely devoted to the issue of selection on synonymous sites in human protein coding genes. Contrary to general belief, negative selection does not always lead to a decrease in the rate of evolution. If a preferred nucleotide is highly mutable, then the rate of evolution may be increased in comparison to a completely neutral site. This will occur due to a preferred to un- preferred nucleotide substitution achieving fixation through drift and driven by a high rate of mutation, while selection will drive the reverse process of un-preferred to preferred substitution. Mammalian synonymous sites appear to be a mixture of sites with different rates of mutation spanning almost two orders of magnitude due to the highly mutable CpG context. The analysis reported in this chapter has shown that highly mutable synonymous sites evolve faster than intron sites with the same CpG context, while synonymous sites outside CpGs, those with a low rate of mutation, evolve slower than intron sites outside the CpG context. Assuming weak selection preferring GC nucleotides in synonymous sites leads to a perfect fit between several independent observations and theoretical predictions. This work remains the most comprehensive study of negative selection on synonymous sites that utilizes both empirical observations and theory. Chapter 5 reports a genome-wide search for two glyoxylate cycle-specific enzymes, isocitrate lyase and malate synthase, in vertebrate genomes. The presence of glyoxylate cycle in metazoans has always been controversial, with all textbooks in biochemistry claiming that the glyoxylate cycle is not present in higher animals. I utilized sequence pattern searches in completely sequenced genomes, and found both glyoxylate cycle-specific enzymes in non-mammalian vertebrates. In addition, malate synthase appears to be still functional in non-placental mammals while being present as a pseudogene in placentals. Interestingly, both of these enzymes show a high rate of horizontal gene transfer throughout eukaryote evolution. In Chapter 6 a new method to study selection in duplicated genes is described. The method assumes that substitution in two gene copies with a high rate of paralogous gene conversion are under selection simultaneously in both genes and, therefore, the strength of selection in such gene copies should resemble selection in a single copy gene. Thus, by comparing substitutions that have, and have not, been subject to gene conversion, we were able to evaluate selection on two diverging gene copies. Interestingly, selection against nonsynonymous substitutions was stronger in two independently evolving gene copies than in gene copies that were evolving in concert. This may be possible due to strong selection for maintaining functional uniformity of two gene copies. Chapter 7 deals with organizational complexity of several metazoan genomes. The "beans on a string" model of gene arrangement has been abandoned since the discovery of a large fraction of nested genes. I was curious to analyze the evolution of such complex, nested gene arrangement. Through many genome comparisons it became clear that in recent evolutionary history complex, nested, gene arrangements have been much more commonly created than destroyed, implying a constant increase in genome organizational complexity. By looking at expression patterns of nested gene pairs no evidence has been found in support of selection playing a role in this independent increase of complexity. This study is the first study that looked at the evolution of complexity by analyzing evolution

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View