Accurate Ancestral Inference and Multi-allelic Haplotype Phasing with MOSHPIT
- Eveloff, Ryan James
- Advisor(s): Gymrek, Melissa
Abstract
In this thesis, we present MOSHPIT (Multi-allelic Outbred Strain Haplotype Phasing and Inference Tool), a novel framework for analyzing genomic datasets of outbred populations. MOSHPIT integrates different variant mutation models into a single Hidden Markov Model (HMM), enabling the simultaneous utilization of single nucleotide polymorphisms (SNPs), insertion/deletions (indels), and short tandem repeats (STRs). This multi-variant capability makes MOSHPIT the first tool of its kind. Extensive evaluations with real-world data and simulated genotypes demonstrate MOSHPIT's superior accuracy and runtime performance compared to existing methods.
MOSHPIT allows comprehensive analysis of outbred populations at a high genomic resolution, facilitating investigations into genotype-phenotype associations in the outbred model. This enhanced understanding of genetic diversity and its impact on observable traits has significant potential for advancing our knowledge of biological processes, complex traits, and disease risk.
In summary, MOSHPIT represents a significant advancement in population genetics analysis, enabling researchers to better understand outbred populations. By integrating multiple variant types and leveraging sophisticated computational techniques, MOSHPIT provides a powerful tool for unraveling the complexities of genetic variation and its relationships with various phenotypes.