Skip to main content
eScholarship
Open Access Publications from the University of California

UC Merced

UC Merced Electronic Theses and Dissertations bannerUC Merced

Uncovering Deep Phylogenetic Signal in Plastid Genomes

Creative Commons 'BY-NC-ND' version 4.0 license
Abstract

The overall aim of my dissertation is to show that a novel source of phylogenetic information from the plastid genome, the tRNA interaction network, coupled with machine-learning and distance-based methods, is capable of accurately reconstructing deep phylogenetic relationships. First, we review the history of the plastid genome as a source of phylogenetic information, discuss sources of systematic biases of plastid sequence data, and introduce the transfer RNA (tRNA) interaction network as a source of phylogenetic data.

Second, I determine the phylogenetic origin of plastids within

the Cyanobacteria tree of life (CyanoToL). Previous studies have strongly supported contradictory conclusions, with plastids branching either early or late within the CyanoToL. I begin by predicting structural features that determine the charging potential of a tRNA with its cognate amino acid, termed tRNA Class Informative Features (CIFs) for 113 Cyanobacterial genomes within eight Cyanobacterial clades. I show that predicted tRNA CIFs differ between Cyanobacterial clades in a phylogenetically informative way that can be exploited to accurately classify Cyanobacterial genomes using a machine-learning algorithm known as a multilayer perceptron (MLP), which we have named CYANO-MLP. I then use CYANO-MLP to test competing hypotheses of the origin of plastids by classifying 440 plastids genomes. I found support for the origin of plastids among a late-branching clade of starch-producing marine/freshwater diazotrophic cyanobacteria. Finally, I show that previously used phylogenetic models are unable to accommodate systematic biases possibly explaining conflicting hypotheses.

Third, I use tRNA CIFs to determine the phylogenetic placement of gnetophytes, a small clade of plants, within the seed plant phylogeny. The location of gnetophytes has been contentious with phylogenomic studies supporting several relationships with cone-bearing seed plants (conifers). Here I use the Jensen-Shannon divergence to calculate a pairwise distance matrix between seed plant clades for plastid tRNA CIFs. Using standard distance-based phylogenetic algorithms I found support for gnetophytes as sister to conifers.

Lastly, I describe the implementation of two software packages. The first is tsfm: tRNA structure function mapper, that provides methods for predicting tRNA CIFs. The second is a suite of tools modeled after GNU Textutils named, FAST: FAST Analysis of Sequences Toolbox, for processing of molecular sequence data on the command line.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View