Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Algorithms and Methods for Characterizing Genetic Variability in Humans

Abstract

Characterizing genetic variation including point mutations and structural variations, is key to understanding phenotypic variation in humans. The rapid development of sequencing technology has fueled the development of computational methods for elucidating genetic variation. In this dissertation, we develop novel computational methods to mainly target two human genetic variation problems using current and emerging sequencing technology. Capturing variation on the haplotype level is challenging with current sequencing technology as it involves linking together short sequenced fragments of the genome that overlap at least two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes, relatively little work has been done on designing sequencing experiments to get long haplotypes. With the development of new sequencing technology and experimental haplotyping methods, we parametrize the haplotyping problem in two contexts, strobe sequencing and clone-based haplotyping, and provide theoretical and empirical assessment of the impact of different parameters on haplotype length. Variation in certain regions of the genome are harder to capture than others. Reconstruction of the donor genome from whole genome sequence data is either based on de novo assembly of the short reads or on mapping reads to a standard reference genome. While these techniques work well for inferring 'simple' genomic regions, they are confounded by regions with complex variation patterns including regions of direct immunological relevance such as the HLA and KIR regions. Characterizing these regions have previously relied on laboratory methods using traditional and quantitative PCR primers and probes which can be labor and time intensive. We address the problem of ambiguous mapping in complex regions by defining a new scoring function for read-to-genome matchings. This scoring function is applied to predicted sequence assemblies of the KIR region in order to determine the most likely KIR haplotype groups of the donor. In another approach, we developing a novel method based on barcoding (deriving signatures) known KIR templates in order to determine the copy number and allelic type of genes in the KIR region directly from whole genome sequencing data without assembly or mapping

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View