Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Phylogenetics in the Pandemic Era

Creative Commons 'BY' version 4.0 license
Abstract

The COVID-19 pandemic of 2020 was one of the first major global public health crises in the post-genomic era, inspiring truly unprecedented levels of viral genome sequencing. In the realm of phylogenetics, or the reconstruction of ancestral relationships between extant sequences, essentially no software existed capable of handling the full dataset in a timely and effective manner. Phylogenetics is critical for the identification and tracking of major variants, particularly the famous Variants of Concern (VOC), leading to a desperate need for scalable tools. I, along with several collaborators, developed an efficient toolkit for the construction, manipulation, and analysis of massive phylogenetic trees. Our core data structure, the mutation annotated tree (MAT), is capable of storing millions of SARS-CoV-2 genomes in less than a gigabyte of data. My key contribution was the development of matUtils, a C++ library and command line toolkit to manipulate these highly compact data files. I additionally developed BTE, a highly efficient API making our phylogenetics software available in a Python environment. I subsequently developed analytical approaches taking advantage of these new tools with the availability and massive scale of the SARS-CoV-2 data. Among these is scalable phylogeographic inference, through the daily-updated website Cluster-Tracker. Cluster-Tracker uses a simple heuristic I developed to efficiently identify and present local transmission clusters for public health track-and-trace efforts. I also designed an approach to the identification of novel SARS-CoV-2 strains and integrated it with the popular Pango lineage system. Altogether, this dissertation presents a body of work contributing substantially to effective global public health response to the SARS-CoV-2 pandemic.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View