Background
Comprehensive molecular profiling has revealed somatic variations in cancer at genomic, epigenomic, transcriptomic, and proteomic levels. The accumulating data has shown clearly that molecular phenotypes of cancer are complex and influenced by a multitude of factors. Conventional unsupervised clustering applied to a large patient population is inevitably driven by the dominant variation from major factors such as cell-of-origin or histology. Translation of these data into clinical relevance requires more effective extraction of information directly associated with patient outcome.Methods
Drawing from ideas in supervised text classification, we developed survClust, an outcome-weighted clustering algorithm for integrative molecular stratification focusing on patient survival. survClust was performed on 18 cancer types across multiple data modalities including somatic mutation, DNA copy number, DNA methylation, and mRNA, miRNA, and protein expression from the Cancer Genome Atlas study to identify novel prognostic subtypes.Results
Our analysis identified the prognostic role of high tumor mutation burden with concurrently high CD8 T cell immune marker expression and the aggressive clinical behavior associated with CDKN2A deletion across cancer types. Visualization of somatic alterations, at a genome-wide scale (total mutation burden, mutational signature, fraction genome altered) and at the individual gene level, using circomap further revealed indolent versus aggressive subgroups in a pan-cancer setting.Conclusions
Our analysis has revealed prognostic molecular subtypes not previously identified by unsupervised clustering. The algorithm and tools we developed have direct utility toward patient stratification based on tumor genomics to inform clinical decision-making. The survClust software tool is available at https://github.com/arorarshi/survClust .