Skip to main content
eScholarship
Open Access Publications from the University of California

Statistical Inference under the Multispecies Coalescent: Methods and Theory

  • Author(s): Guerra, Geno A
  • Advisor(s): Nielsen, Rasmus
  • et al.
Abstract

The rising availability of genome-scale data for a large number of species has allowed for more in-depth studies of the genetics between species using increasingly sophisticated methods. The accumulation of pairwise differences between individuals are indicative of how diverged they are in time. The multi-species coalescent (MSC) has been the most popular framework with which to model the dynamics of the coalescent process in the presence of species barriers, such as a tree structure. Modelling using the MSC in the presence of increasing amounts of data (loci and species) while maintaining feasible computational times is the main focus of many emerging methods.

In this dissertation, I explore the use of the MSC in 3 different ways, using classical and novel statistical analysis to provide insight into species divergence parameters. I begin by constructing a novel statistical method for inferring species tree divergence times and population size parameters for any given tree topology from sequence data. The program COAL-PHYRE, presented here, makes use of the MSC marginally between individuals, as I demonstrate that pairwise information within the MSC is sufficient to learn times and population sizes on a tree. My focus then shifts to the derivation of the covariance between pairs of coalescence times and its application to studying average pairwise differences and the commonly used statistic, Fst. I confirm that estimates of Fst are biased, and quantify the effect of not accounting for this bias in different applications. I conclude by continuing to study the covariance between coalescence times and its use in inferring species tree topologies. I define a metric based on these statistics which, when paired with the minimum spanning tree algorithm, provides estimates of species tree topologies. I provide partial proofs of statistical consistency of the approach.

Main Content
Current View