Using long reads to improve haplotype phasing, genome assembly, and gene annotation
Despite their accuracy, next-generation DNA sequencing technologies have limited utility in analyzing ambiguous and repetitive parts of the genome due to the short length of reads. Third-generation long read DNA sequencing technologies, such as those from Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio), allow us to explore much more of the genome and perform more comprehensive genomic analyses. However, new software must be developed for these analyses in order to take advantage of the increased read lengths, while mitigating errors from base-level inaccuracies. In this thesis, I explore the advantages of long reads for haplotype phasing and genome assembly. I then use genome assemblies created from long reads to perform comparative genomics analyses, focusing on gene annotation of new, high-quality assemblies of primates and humans, including annotating the first fully complete human genome and a human pangenome containing over 90 distinct haplotypes.