Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Enabling comparative genomics at the scale of hundreds of species

Creative Commons 'BY-SA' version 4.0 license
Abstract

Comparing related (homologous) subsequences between genomes from different species gives insight into their function. This information is captured in ``genome alignments'', which are essential for almost all comparative genomics analyses. However, most existing methods to create a genome alignment suffer from reference-bias (where only one genome is fully aligned to all others), or ignore duplication events. Though the Cactus genome aligner avoided these restrictions, it could not align more than a few genomes without becoming cost-prohibitive as well as losing accuracy. I developed and refined a “progressive alignment” extension to Cactus to allow it to produce a full alignment in time linear in the number of input genomes while maintaining similar, or often improved, quality. This new method allows Cactus to align hundreds of large vertebrate genomes---enabling comparative genomics at an unprecedented scale. During its development I used Cactus as an essential component of several successful comparative genomics projects. Working closely with the 200 Mammals and Bird 10K projects, I have used Cactus to create an alignment of over 600 bird and mammal genomes, which is by far the largest genome alignment ever created. Finally, I have utilized this alignment to provide a highest-possible-resolution annotation of mammalian and avian evolutionary constraint, using the uniquely large number of taxa to enable the examination of weak effects of purifying selection.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View