- Main
Methodological advancements for genome reconstruction by haplotyping long read sequence data
- Pesout, Trevor W
- Advisor(s): Paten, Benedict
Abstract
Second-generation sequencing technology and accompanying analyses resulted in a deluge of information about variation in human populations, enabling large-scale association studies and precision medicine. However, there are genomic contexts which cannot be analyzed using these technologies. With the advent of long-read sequencing, previously unmappable regions of the genome have become accessible, paving the way for more comprehensive analyses of the human genome. However, new methods are required to leverage the increased length of these data as well as mitigate the poor sequence accuracy. In this work, I present an accurate and efficient application "Margin", which uses a Hidden Markov Model to separate read and variant data into haplotypes. I describe work to validate the method and show applicability in variant calling, I demonstrate ways to overcome systematic errors in nanopore sequence data and correct assembled sequence, and I document the tool's use in a state-of-the-art variant caller for Oxford Nanopore and PacBio HiFi data used to generate reference materials and make medical diagnoses.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-