Data structures and algorithms for read mapping to pangenome graphs
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Data structures and algorithms for read mapping to pangenome graphs

Creative Commons 'BY' version 4.0 license
Abstract

The human reference genome is one of the most important foundational resources in biological research but its utility as a reference for all people is limited due to its lack of diversity.Pangenomes are an alternative representation of genomes that incorporate genetic variations from a population of individuals. Using a pangenome as a reference can mitigate the bias incurred by using the current standard reference genome, but because of the increased size and complexity of pangenomes, tools that use them tend to be slower and less reliable than tools that use standard references. Mapping sequencing reads to a reference, the first step in many genomic pipelines, is a particularly challenging problem in a pangenome context. In this dissertation, I present my work developing data structures and algorithms to support read mapping to pangenome graphs. The pangenomic read mapping tools that I helped develop over the course of my PhD are as efficient as linear mappers and improve variant calling and genotyping results compared to standard tools. They are among the first practical pangenome mappers that are paving the way for the emerging field of pangenomics.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View