Analysis of genomic variants via gene networks
- Author(s): Hofree, Matan
- Advisor(s): Ideker, Trey
- et al.
Genome-wide measurements of genomic state offer unprecedented opportunities for biological discovery, with potential to make dramatic impact on medicine and life. One fundamental challenge is associating complex phenotypes with genetic cause. Here, I will describe efforts to advance solutions to this challenge via analysis of gene networks.
Genome-wide association studies are designed link between a phenotype and genomic loci anywhere in the genome; however, applying standard statistics to such data has fallen far short of building accurate predictive models for disease. We use Adaboost, a large-margin classification algorithm, to predict disease status in two cohorts of diabetes and suggest a method for overcoming limitations arising from correlation between genetic variants. We uncover a novel set of 163 disease-associations, missed by `classic' statistics.
Classification of cancer remains predominantly organ based and fails to account for considerable heterogeneity of outcomes. Tumor genomes provide a new source of data for uncovering subtypes, but are difficult to compare, as tumors share few mutations in common. We introduce network-based stratification (NBS), a method for integrating somatic genomes with networks encoding biological knowledge. This allows for identification of cancer subtypes by clustering tumors with mutations in similar network regions. We demonstrate NBS in multiple cancer cohorts, identifying subtypes predictive of clinical features and outcomes, and highlighting sub-networks characteristic of each.
Current approaches for identifying cancer genes rely on the idea that particular perturbations, occurring in a subset of genes unique to each cancer type, are selected for by conferring a survival advantage to tumor cells. Such genes are expected to be enriched for mutations when examined across a population. Here we show that 30-50% of well-known cancer genes are not significantly elevated in mutation frequency. Despite this lack of enrichment, known cancer genes are enriched for mutations causing changes in amino-acid composition, protein structure properties and conservation. Furthermore, we observe 15-30% of cancer genes have altered mutation rates conditioned on other genes, each individually spanning the range of single-gene mutation frequencies, implicating a large genetic interaction network underlying human cancer. This suggests a substantial number of cancer genes will never be identified by frequency alone.