In recent years, advanced technologies have enabled people to collect complex data and the analysis of such data can be challenging. My dissertation focuses on developing new methodologies and computational algorithms in non- and semi- parametric regression models to analyze complex and large scaled data. Chapter 1 introduces commonly used semiparametric models and their properties. Chapter 2 reviews B-splines approximation to the nonparametric functions. Chapter 3 provides an overview of methodologies including generalized estimating equations and mixed models, which are used to analyze correlated data.
In chapter 4, we propose a flexible generalized semiparametric model for repeated measurements by combining generalized partially linear single index model with varying coefficient model. The proposed model is a useful analytic tool to explore dynamic patterns which naturally exist in longitudinal data and also study possible nonlinear relationships between the response and covariates. We then employ the quadratic inference function and develop an estimation procedure to estimate unknown regression parameters and nonparametric functions. To select variables and estimate parameters simultaneously, we further obtain penalized estimators. Moreover, we establish theoretical properties of the parametric and nonparametric estimators. Both simulations and an empirical example are presented to illustrate the use of the proposed model.
In chapter 5, we propose a semiparametric model in genome-wide association studies (GWAS). The use of linear mixed models (LMMs) in GWAS is now widely accepted because LMMs have been shown to be capable of correcting for several forms of confounding due to genetic relatedness of sampled data. On the other hand, gene and environment (G × E) interactions play a pivotal role in determining the risk of human diseases. Conventional parametric models such as LMMs may not reflect the underlying nonlinear G × E interactions, which will result in serious bias. Therefore, we propose a semiparametric mixed model to investigate important gene associations in the context of possible nonlinear G × E interactions in GWAS. We further propose a profile maximum likelihood estimation procedure to estimate the parameters and nonparametric functions, and apply the restricted maximum likelihood estimation method to estimate the variance components. For these profile parameter and nonparametric function estimators, asymptotic consistency and normality are established. Moreover, the Rao-score-type test procedure is developed and a multiple testing process is employed to identify the important genetic factors. Both simulation studies and an empirical example are presented to illustrate the use of our proposed model and methods.