The advent of time- and cost-effective technologies for genotyping and sequencing human DNA has massively increased both the type and amount of genetic data available for study. In order to best utilize this data, new methods must be developed to better assess how human history affects genetics and how genetics affects human phenotypes such as height, eye color and disease risk.
This work presents five new methods that build upon each other to address this challenge. The first method leverages geographic information contained in rare genetic variation to infer the genetic ancestry of individuals at each location in the genome. It increases ancestry inference accuracy when applied to cohorts of continentally admixed individuals. This method also allows inference of local ancestry when studying cohorts containing subcontinentally admixed individuals.
The second method applies the idea of highly structured geographic information in rare variation to create a better variant filtering approach for finding the causal variation in monogenic disorders. By finding better estimates of allele frequencies both within and across populations, it reduces the number of variants that must be considered as potentially disease causing. This results in decreased time and cost expenditures in necessary follow-up analyses.
Due to multiple testing issues, compound heterozygous architectures and haplotype affects are difficult to detect as contributing to complex diseases or gene regulation. The next two methods present ways to detect these complex features. Compared to standard marginal association approaches, these two methods show that compound heterozygous architectures and haplotype effect models often better capture the genetic contributions to traits. The results demonstrate the need for future fine-mapping approaches that seek complex causal architectures.
The final method in this work searches for causal relationships in gene expression networks. These networks are formed by genes with highly correlated expression levels. However, the correlation may be due to unobserved confounding variables. By utilizing genetic variants as instrumental variables, this method finds causal gene-on-gene effects. Knowing the direction and magnitude of gene-on-gene effects is vital to better understanding regulatory networks in disease pathways and for the identification of drug targets.