Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Effective design and analysis of systems genetics studies

Abstract

Systems genetics studies for unraveling genetic basis of complex traits have been one of the most propitious research area with the advance of high-throughput biotechnologies. This thesis presents several computational and statistical challenges in effective design and analysis of systems genetics studies and present novel methodological advances and corresponding results in several specific contexts of systems genetics studies. First, I present an extensive haplotype analysis on a recently collected catalogue of genetic variation among inbred mouse strains, which revealed the contribution from ancestral subspecies, haplotype block structure, and complex history of each genomic segments among the inbred mouse strains. In addition, I accurately imputed the uncollected genotypes in the resource by developing a novel and efficient genotype imputation method which adaptively learns parameters from data using an Expectaion-Maximuzation (EM) algorithm. Our method is demonstrated to outperform previous methods in both mouse and human data. Statistical analyses in systems genetics studies are often confounded by unmodeled factors such as heterogeneous sample structure. Recent studies suggested that mixed models correct for the sample structure in association mapping, but the available methods suffer from substantial computational cost to be applied in genome- wide association mapping. I developed the Efficient Mixed Model Association (EMMA), which takes advantage of the invariant structure of eigenvectors in applying mixed models for association mapping, which substantially increase the computational efficiency in several orders of magnitude. Our method was shown to successfully reduce inflated false positives in in silico genome-wide association mapping of inbred mouse strains involving hundreds of thousands of markers. I further extend EMMA to accommodate even larger scale of genome-wide association mapping in humans, typically involving several thousands or more individuals, and demonstrate that the method consistently eliminates the significant over-dispersion of test statistics across multiple human data sets. The method has been further employed in correcting for a different type of confounding effects in expression studies. I developed a novel mixed-model method that corrects for the spurious associations and trans- regulatory bands caused by systematic confounding effects using inter-sample correlation of expression measurements. Finally, in the design of association studies using inbred strains, I propose a novel trait mapping strategy using hybrid mouse diversity panel (HMDP). By integrating classical inbreds and multiple sets of recombinant inbreds while precisely accounting for the sample structure using high-density markers with EMMA, the proposed design is shown to much more powerfully and precisely identify previously known associations than previous approaches.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View