Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Probabilistic models and statistical inference in population genetics

Abstract

Advances in sequencing and genotyping technologies have enabled data collection at unprecedented scales. This data deluge has driven demand for new, scalable methods and has allowed genomic approaches to be used to answer a broader set of questions. In this dissertation I analyze probabilistic models arising in population genetics, and I develop statistical techniques for extracting information from sequencing and genotyping data. I begin by analyzing Lambda and Xi coalescents, which arise as models of the genealogy of a sample from a population with large variance in the number of offspring per individual, such as marine species with sweepstakes-like reproduction or viruses undergoing continuous, strong selection. In particular, I show how to compute the expected value of a common summary statistic--the site frequency spectrum--under such models, and prove theorems about the identifiability of the model from the observed frequency spectrum. Such results suggest that it may be possible to learn about the underlying biological processes from observed site frequency spectra. I then present a method to find segments of Neanderthal ancestry in present-day humans, and use that method to learn about selective pressures on Neanderthal ancestry. I find that the observed patterns of Neanderthal ancestry are consistent with simple negative selection, as opposed to hybrid incompatibilities. Lastly, I develop a fast method to infer fine-scale recombination rates and apply it to 26 diverse human populations, elucidating the evolutionary dynamics and molecular modifiers of local recombination rates.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View