UC Santa Cruz
Identification and Mixture Deconvolution of Ancient and Forensic DNA using Population Genomic Data
- Author(s): Vohr, Samuel Henry
- Advisor(s): Green, Richard E
- et al.
Forensic scientists routinely use DNA for identification and to match samples with individuals. Although standard approaches are effective on a wide variety of samples in various conditions, issues such as low-template DNA samples and mixtures of DNA from multiple individuals pose significant challenges. Extreme examples of these challenges can be found in the field of ancient DNA, where DNA recovered from ancient remains is highly fragmented and marked by patterns of DNA-damage. Additionally, ancient libraries are often characterized by low endogenous DNA content and contaminating DNA from outside sources. As a result, standard forensics approaches, such as amplification of short-tandem repeats, are not effective on ancient samples. Alternatively, ancient DNA is routinely directly sequenced using high-throughput sequencing to survey the molecules that are present within a library. However, the resulting sequences are not easily compared for the purposes of identification, as each data set represents a random and, in some cases, non-overlapping, sample of the genome.
In this dissertation, I present two approaches for interpreting shotgun sequences that address two common issues in forensic and ancient DNA: extremely low nuclear genome coverage and mixtures of sequences from multiple individuals. First, I present an approach to test for a common source individual between extremely low-coverage sequence data sets that makes use of the vast number of single-nucleotide polymorphisms (SNPs) discovered by surveys of human genetic diversity. As almost no observed SNP positions will be common to both samples, our method uses patterns of linkage disequilibrium as modeled by a panel of haplotypes to determine whether observations made across samples are consistent with originating from a single individual. I demonstrate the power of this approach using coalescent simulations, downsampled high-throughput sequencing data and published ancient DNA data. Second, I present an approach for interpreting mixtures of mitochondrial DNA sequences from multiple individuals. Mixed DNA samples are common in forensics investigations, either from the direct nature of a case (e.g., a sample containing DNA from both a victim and a perpetrator) or from outside contamination. I describe an expectation maximization approach for detecting the mitochondrial haplogroups contributing to a mixture and partitioning fragments by haplogroup to reconstruct the underlying haplotypes. I demonstrate the approach’s feasibility, accuracy, and sensitivity on both in silico and in vitro sequence mixtures. Finally, I present the results of applying our mixture interpretation approach on ancient contact DNA recovered from ~700 year old moccasin and cordage samples.