Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Inference of population history and mutation biology from human genetic variation

Abstract

Human genetic diversity bears many imprints of our species’ migration out of Africa. When our ancestral population expanded across the globe and settled into nearly every habitable environment, genetic diversity was reduced, distributed and redistributed by an intricate series of population bottlenecks and migrations. The coalescent process is a mathematical model that describes the probability distribution of modern DNA sequences that could have evolved from a common ancestor following a specific demographic scenario; in principle, this gives us the means to infer the history that is most consistent with large datasets of genomes sequenced from living people. In practice, however, the coalescent is such a complicated model that such calculations are intractable to perform exactly. Here, I introduce several new techniques for performing approximate demographic inference under the coalescent; these operate by condensing samples of genomes into more compact summary statistics and then mathematically approximating the probability distributions that these statistics should follow. I then use these techniques to infer joint demographic histories from European and African genomes, describing a complex out-of-Africa migration that involved multiple pop- ulation size changes as well as a long period of migration between the diverging continental groups. The good match between predicted and observed genomic samples indicates that the coalescent is a useful framework for describing the evolution of humans; however, I also note systematic discrepancies between the model and the data. In the last two chapters of this thesis, I go on to show that some deviations of human data from coalescent predictions stem from the coalescent’s oversimplication of the way mutations are generated. One stan- dard assumption is that mutations occur independently; in contrast, at least 2% of human occur in linked clusters and are likely to have been generated by multinucleotide mutation events (MNMs). Examining the derived and ancestral alleles of these MNMs, I show that they are enriched for transversions and that many bear the specific signature of error-prone Polymerase ζ. A second assumption I show to be violated is that the mutation rate has not changed over time and is constant across populations. I show this indirectly by demon- strating that C→T transitions, particularly in the context TCC→TTC, are more frequent in Europeans than in other populations. Although it is not clear whether this mutation rate change was functionally significant or driven by selection, it demonstrates that the process of genome evolution has not stayed constant during recent human history, but has been regionally differentiated by the forces that shape our primary genome sequences.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View