Center for Bioinformatics and Molecular Biostatistics
Penalized Cox Regression Analysis in the High-Dimensional and Low-sample Size Settings, with Applications to Mi-croarray Gene Expression Data
- Author(s): Gui, Jiang
- Li, Hongzhe
- et al.
An important application of microarray technology is to relate gene expression profiles to various clinical phenotypes of patients. Success has been demonstrated in molecular classification of cancer in which the gene expression data serve as predictors and different types of cancer serve as a categorical outcome variable. However, there has been less research in linking gene expression profiles to the censored survival data such as patients' overall survival time or time to cancer relapse. Due to large variability in time to certain clinical event among patients, studying possibly censored survival phenotypes can be more informative than treating the phenotypes as categorical variables. We propose to use the L1 penalized estimation for the Cox model to select genes that are relevant to patients' survival and to build a predictive model for future prediction. The computational difficulty associated with the estimation in the high-dimensional and low-sample size settings can be efficiently solved by using the latest developed least angle regression method. Results from our simulation studies and application to real data set on predicting survival after chemotherapy for patients with diffuse large B-cell lymphoma demonstrate that the proposed procedure, which we call the LARS-Lasso procedure, can be used for identifying important genes that are related to time to death due to cancer and for building a parsimonious model for predicting the survival of future patients. The LARS-Lasso regression gives much better predictive performance than the L2 penalized regression or dimension-reduction based methods such as the partial Cox regression method.