Demography-aware inference of the strength of natural selection
- Author(s): Ortega Del Vecchyo, Vicente Diego
- Advisor(s): Novembre, John
- Lohmueller, Kirk E
- et al.
Levels of genetic and possibly phenotypic variation are influenced by how natural selection acts to change the frequency of deleterious and advantageous mutations. However, the demographic history of a population influences the efficacy of natural selection to keep deleterious variants at low frequencies and to raise the frequency of advantageous mutations. I present three projects where I study how natural selection works in the context of different demographic histories. On the first project, I study the early demographic history of dogs and wolves since their divergence using genomic data. I inferred population bottlenecks in dogs and wolves and I found evidence for gene flow between dogs and wolves after their divergence. I develop a summary statistic to find the most plausible demographic model for dogs and wolves, where I found evidence for a demographic model stating that dogs evolved from one single location. This project laid the foundation to study how advantageous and deleterious variants behave in the context of the bottlenecks found in dogs and wolves. On the second chapter, I leverage the demographic models I inferred to study how demographic processes have influenced levels of deleterious genetic variation in dogs using 90 whole-genome sequences from breed dogs, village dogs and gray wolves. I used the ratio of heterozygosity at amino-acid changing variants over silent variants to show how bottlenecks associated with domestication and breed formation in dogs have affected the efficacy of negative selection. I show multiple lines of evidence indicating that bottlenecks, and not inbreeding, are driving the patterns of deleterious genetic variation we observed in dogs. In the third project, I develop a novel likelihood-based method that uses the lengths of pairwise haplotype identity by state among haplotypes carrying rare variants. The method conditions on the present-day frequency of the allele and is based on theory predicting that, under constant population sizes, the alleles under negative selection are on average younger than neutral alleles and should have higher average levels of haplotype identity among variant carriers. I developed a computational framework to obtain the probability distribution of the lengths of pairwise haplotype identity given a certain selection coefficient, demographic scenario and present-day allele frequency. Simulations indicate that our method provides unbiased estimates of selection under constant population sizes and realistic demographic scenarios. I show how the method can also be used to estimate the parameters that define the distribution of selective coefficients of a set of rare variants. I provide an example of how to apply this method to estimate the distribution of selective coefficients of a set of amino-acid changing variants in the UK10K, a large genomic dataset of British individuals.