Mixture and Mixed Models Analysis for Genetic Variants
- Author(s): Zhan, Haimao
- Advisor(s): Xu, Shizhong
- et al.
Advances in DNA sequencing technologies allow us to genotype most of the genetic variants and investigate their effects on phenotypes. Although many genes controlling Mendelian disorders were successfully identified in the past two decades, the genetic mechanisms underlying complex traits controlled by lots of genes with small effects are still not well understood. It becomes desirable to develop more powerful statistical methods that can integrate information from the quantitative traits, gene expression and high-density genetic markers, and precisely identify the genetic variants for complex traits.
In Chapter 2, we developed a stochastic expectation-maximization algorithm for mixture model-based cluster analysis which is a general framework for integrated study for genetic variant, gene expression and phenotype. The strength of association is modeled using Gaussian mixture with two components. The sampling step in stochastic EM algorithm improves the convergence of parameters when initial values are poor. The same mixture model and stochastic EM algorithm can be used to identify expression QTL and association study between gene expression and quantitative trait.
In Chapter 3, we proposed a generalized linear mixed model for mapping segregation distortion loci which can affect the viability of individuals in a population. This dissertation presents a method in which the segregation distortion analysis is formulated as a quantitative genetics problem using hypothetical liability. The generalized linear mixed model contains the genetic variants across the whole genome and estimates genetic effects using Bayesian approach which only requires likelihood function, linear predictor and prior distribution. The mixed model approach is able to handle high-dimensional genomic data.
In Chapter 4, adaptive ridge regression method is used to estimate the collective effects of rare variants within the same functional group for continuous traits. The adaptive ridge regression model does not assume the directions of the effects. The shared variance for one group is used as a score for testing the overall effects of rare variants. Genetic variants in the same group are selectively weighed to prevent the shared variance being diluted by non-functional variants. The adaptive ridge regression method can be easily extended to handle multiple groups of rare variants.