One and Two Locus Likelihoods Under Complex Demography
- Author(s): Kamm, John Arthur
- Advisor(s): Song, Yun S.
- et al.
The coalescent is a random process that describes the genealogy relating a sample of individuals, and provides a probability model that can be used for likelihood-based inference on genetics data. For example, coalescent models may include recombination, natural selection, population size crashes and growth, and migrations, and thus can be used to learn the strength of these biological and demographic forces. Unfortunately, computing the likelihood of data remains a challenging problem in many of these coalescent models.
In this dissertation, I develop new equations and algorithms for computing coalescent likelihoods at one or two loci, and apply them to inference problems in a composite likelihood framework. I begin by developing an algorithm for the one-locus case, computing the site frequency spectrum (the distribution of mutant allele counts) under complex demographic histories with population size changes (including exponential growth), population splits, population mergers, and admixture events. This method improves on the runtime and numerical stability of previous approaches, and can successfully infer demographic histories that would otherwise be too computationally challenging to consider. I then consider the two- locus case, and derive a formula for the likelihood at a pair of sites under a variable population size history; this formula scales to tens of individuals. In addition to this exact formula, I also develop a highly efficient importance sampler to compute the same likelihood. I apply these results to the problem of inferring recombination rates under variable population size.