Genome-wide association study (GWAS) has became a powerful tool for revealing the genetic architecture of complex traits in plant studies, animal research and human disease. This method involves scanning genotypes from many different samples to study the associations between genetic markers and phenotypes. With the availability of low-costing and high-throughput technology, large-scale data are provided for analysis and efficient algorithms are needed to scan up to millions markers. Large-scale genomic study also involves high-dimensional statistics, which brings out lots of difficulties in modeling and computation in practice. This dissertation addresses two problems in GWAS, that are, the computational efficiency of marker scanning and correction of Beavis effect. The interesting connection between the fixed effect and the random effect in a linear mixed model is the inspiration for the current work. The methods we proposed are fully supported theoretically and empirically.
In the first half of this dissertation, we investigate the significant test of markers in GWAS and propose a method for constructing a de-shrink Ridge estimator. This enables us to scan all the markers simultaneously in one model. The de-shrink estimators and test statistic are fast to compute. They also have comparable level as the conventional GWAS approaches, such as efficient mixed model association (EMMA). We also prove that given sufficient information the de-shrink estimators are asymptotically equivalent to the fixed effect estimators in EMMA.
The second half of this dissertation is focusing on correcting the bias caused by the Beavis effect in GWAS. The Beavis effect refers to a phenomenon that the average effect size of the detected locus is inflated due to statistical tests. There is an increasing interest in applying linear mixed model in GWAS and the scanned marker is typically treated as fixed effect, which is called fixed model (FM) approach. Another way to tackle the same problem is considering the marker effect as random and this method is called random model (RM) approach. However, the random term results in extra computational burden. We develop a novel random fixed approach (RFM) to relieve the computational difficulties. Taking advantage of RFM and the censoring fact, we propose an efficient way to correct the Beavis effect. We demonstrate the method in simulated dataset and real data applications.