Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Methods for Comparative Genome Analysis With Applications to Pan-Genomics and Genome Annotation

Abstract

Comparative genomics is a powerful analytical tool for understanding the structure of genomes and their evolution. The tenet of comparative genomics is that evolutionarily conserved (thus functionally important) genomic features between two species share significant similarity at the DNA or protein level. Recent technological advancement in DNA sequencing instruments enabled the number of sequenced genomes for different species to increase exponentially. The expanded set of available genomes has provided new opportunities to carry out comparative genome analyses at unprecedented scale.

In this dissertation, we discuss and investigate a set of comparative genomics methods relevant to genome assembly, genome annotation and pan-genome analysis. Comparative genomics can assist \emph{de novo} genome assembly during the scaffolding phase and in the evaluation of assembly quality. The annotation phase takes advantage of comparative genomics by leveraging annotations from related species to predict coding and non-coding gene boundaries, intron/exon boundaries, repetitive elements, and many other genomic features. Functional annotation also relies on comparative genomics to assign putative functions to annotated genes using known functions of evolutionarily-conserved genes and proteins. Finally, intraspecies comparative genomics is the cornerstone of pan-genome analyses that allows one to determine which portions of the genome are common to all individuals, and which portions are variable among the individual of a species. A new pan-genome representation and visualization method is introduced here to elucidate complex structural genomic variations.

Experimental results on the genomes of (1) Vigna unguiculata (cowpea or black eye pea) which provides a valuable source of protein to millions of people in developing countries, (2) Phytophthora infestans which is an oomycete which causes a potato and tomato disease known as late blight, and (3) Babesia duncani which is tick-transmitted protozoan parasites that causes severe infection in immunocompetent individuals, demonstrate the effectiveness and utility of these comparative genomics methods.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View