Dimension Reduction Methods for Microarrays with Application to Censored Survival Data
Recent research has shown that gene expression profiles can potentially be used for predicting phenotypes such as cancer types and survival time in biomedical research. Microarray technology which simultaneously measures expression values of thousands of genes provides a powerful tool as well as new challenges in relating gene expression profiles to phenotypes. Expression data are often very high-dimensional, which makes statistical modeling more difficult and complex, especially when the phenotypes such as time to death or cancer recurrence are subject to right censoring. We consider in this paper a model-free sufficient dimension reduction technique to reduce the dimension of microarray data in the context of analyzing censored survival data. We propose a dimension reduction technique which does not assume a particular model for survival time given gene expression values. After dimension reduction, the constructed gene expression components are used as covariates for predicting the survival probabilities in the framework of censored data regression analysis. In particular we use the popular Cox proportional hazards model to build a predictive model for survival. We demonstrate the use of the methodology by applying to a large diffuse large B-cell lymphoma gene expression data set, which consists of 240 patients and 7399 genes. The Cox proportional hazards model with the derived gene expression components is shown to provide a good predictive performance for patient's survival as demonstrated by the receiver operator characteristics analysis. The predictive model built using the training data set predicted highly significant survival difference in the testing data.