The molecular system of the cell can be represented as a network with genes or proteins represented as the nodes and the various types of interactions between these proteins represented as the edges. This dissertation explores both the benefits and challenges of employing molecular network information to discover new candidate disease genes.
First, I present NSD1, a histone methyltransferase, that when mutated, confers improved survival for HPV-negative head and neck squamous cell cancer (HNSCC) patients. This project describes a standard approach for the discovery and validation of a single-gene biomarker. We identified NSD1 through bioinformatic analysis of clinical and genomic data. We then showed that disruption of NSD1 in vitro sensitizes HNSCC cancer cells to cisplatin, a standard chemotherapy used to treat HNSCC.
In contrast to the NSD1 results, many cancer cohorts cannot be subtyped easily by the mutation status of a single, or small number of genes due to the heterogeneity of tumor genomes. Therefore, I developed a software package implementing an algorithm that utilizes a molecular network to stratify cancer patients into prognostically relevant subtypes. This software package, called pyNBS, assumes that patients with mutations in genes that are near one another in this network will have disruptions in the same molecular pathways that will manifest as similar outcomes. Our benchmarking of pyNBS shows that it executes the NBS algorithm faster, uses fewer computational resources and makes the methodology more accessible to a broader audience.
In the final chapter of my thesis, I created a framework to evaluate molecular networks on their capacity to recover known disease-associated gene sets via network propagation. Given the number and diversity of molecular networks available, this framework establishes a benchmark to determine which network is appropriate for discovery of new candidate genes of a particular disease. In this study, we found that larger functional interaction networks perform the best at this task and then constructed our own network that outperformed any other network, which we call PCNet. We now hope to leverage network propagation in other ways, including in aiding the discovery of new candidate disease genes from genome-wide association studies.