Microarray Gene Expression Data with Linked Survival Phenotypes: Diffuse Large-B-Cell Lymphoma Revisited
Diffuse large-B-cell lymphoma (DLBCL) is an aggressive malignancy of mature B lymphocytes and is the most common type of lymphoma in adults. While treatment advances have been substantial in what was formerly a fatal disease, less than 50% of patients achieve lasting remission. In an effort to predict treatment success and explain disease heterogeneity clinical features have been employed for prognostic purposes, but have yielded only modest predictive performance. This has spawned a series of high profile microarray-based gene expression studies of DLBCL, in the hope that molecular level information could be used to refine prognosis. The intent of this paper is to reevaluate these microarray-based prognostic assessments, and extend the statistical methodology that has been used in this context.
Methodological challenges arise in using patients’ gene expression profiles to predict survival endpoints on account of the large number of genes and their complex interdependence. We initially focus on the Lymphochip data and analysis of Rosenwald et al., (2002). After describing relationships between the analyses performed and gene harvesting (Hastie et al., 2001), we argue for the utility of penalized approaches, in particular LARS-Lasso (Efron et al., 2004). While these techniques have been extended to the proportional hazards / partial likelihood framework, the resultant algorithms are computationally burdensome. We develop residualbased approximations that eliminate this burden yet perform similarly. Comparisons of predictive accuracy across both methods and studies are effected using time-dependent ROC curves. These indicate that gene expression data, in turn, only delivers modest predictions of post therapy DLBCL survival. We conclude by outlining possibilities for further work.