Personalized cancer therapy is an emerging treatment strategy based on the ability to predict which patients are more likely to respond well to specific treatments. It involves the systematic use of genetic or other information about an individual patient to optimally select a course of treatment. This dissertation presents mathematical models and algorithms to predict drug response on a personalized level and to understand the causes of different responses.
Recent large cancer studies have measured somatic mutations in an unprecedented number of tumours. These large datasets finally allow the identification of cancer-related sets of genetic mutations, in particular we are interested in identifying groups of genetic mutations that are associated with positive or negative drug response. We propose a combinatorial formulation for the problem, and prove that it's computationally hard. We design two optimization algorithms to solve the problem and implement them in our tool UNCOVER. We provide analytic evidence of the effectiveness of UNCOVER in finding high-quality solutions and show experimentally that UNCOVER finds sets of alterations significantly associated with functional targets in a variety of scenarios. In particular, we show that our algorithms find sets which are better than the ones obtained by the state-of-the-art method. In addition, our algorithms are much faster than the state-of-the-art, allowing the analysis of large datasets of thousands of target profiles from cancer cell lines. While we formulate this as a more general computational problem, we use UNCOVER to analyze drug response data, identifying sets of mutations associated with drug sensitivity.
Our next contribution is a computational method, named NETPHIX (NETwork-to-PHenotype assocIation with eXclusivity), which aims to identify subnetworks of genes whose genetic alterations are associated with a continuous cancer phenotype. Leveraging the properties of cancer mutations and the interactions among genes, we formulate the problem as an integer linear program and solve it optimally to obtain a set of associated genes. Note that this algorithm solves a related but different mathematical problem than the one considered by UNCOVER since it also takes into account functional relationship among genes, which can be captured as an input network. Additionally NETPHIX, unlike UNCOVER, allows to pick up mixed sensitivity modules. Applied to a large-scale drug screening dataset, NETPHIX uncovered gene modules significantly associated with drug response, and many of the modules are also validated in another independent dataset. Utilizing interaction information, NETPHIX modules are functionally coherent, and can thus provide important insights into drug action.
We also include a case study that provides novel biological insight obtained from NETPHIX and expression correlation analysis to investigate the genetic mutations associated with mutational signatures. Specifically, our analysis aims to answer the following two complementary questions: (i) what are functional pathways whose gene expression activities correlate with the strengths of mutational signatures, and (ii) are there pathways whose genetic alterations might have led to specific mutational signatures. Analyzing a breast cancer dataset, we identified pathways associated with mutational signatures on both expression and mutation levels, elucidating differences between related signatures .
UNCOVER and NETPHIX can be used directly for personalized cancer treatment by looking at the genomic alterations of patients and checking if these are correlated with drug sensitivity for any candidate treatment.
While these methods can lead to useful insight towards personalized drug treatment our last contribution attempt to solve the more practical problem of predicting drug response based on all the information oncologists have available about the patient, this can include genetic information but is more often based on demographics, histology report, baseline labs and medical history recorded in the patient's Electronic Health Record. We propose a framework to simultaneously predict multiple outcomes for each treatment i.e. we are not only concerned with the expected survival time of the patient, other relevant factors such as quality of life and side effects are also considered as important quantifiable outcomes. These outcomes are heavily correlated to each other and one can leverage this property to improve prediction performance over predicting each outcome separately. Furthermore to the authors best knowledge there is no current published work or package that is able to handle a mix of survival, continuous and categorical outcomes. We provide a unified framework for prediction of heterogeneous outcomes in a clinical setting, leveraging an ensemble learning method known as random forests. We propose an updated node splitting rule that captures the heterogeneity of clinical outcomes.