Improving sequence alignment and variant calling through the process of population and pedigree-based graph alignment
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Improving sequence alignment and variant calling through the process of population and pedigree-based graph alignment

Creative Commons 'BY' version 4.0 license
Abstract

In current sequencing methodology, a linear genome reference is used to detectgenetic-variants based on collections of sequence reads. The linear reference introduces potential misalignment of reads that don’t exactly match the reference or the copy number of sequences in the reference doesn’t match the sample correctly. This is known as reference bias. In the field of clinical genetics for rare diseases, a resulting reduction in genotyping accuracy in some regions has likely prevented the resolution of some cases. Pangenome graphs embed population variation into a reference structure to reduce reference bias. While this helps to reduce reference bias, further performance improvements are possible with the aid of pedigree information. In this dissertation I present my research on the methods developed to build programs that apply pangenome graphs to solve these problems. First, I share the work I’ve contributed towards streamlining a single-sample pangenome software workflow and the accuracy enhancements I’ve contributed within the pangenome effort. Next, I share my methods in incorporating pedigree information within the pangenome framework and show how performance is improved over standard pangenomes. I describe an extension of this work to demonstrate the clinical application of this workflow. Finally, I cover various projects I’ve contributed to that catalogue and use detected variants for deleterious classification.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View