Pre-mRNA splicing is a highly regulated step during gene expression and has been shown to be commonly altered across cancers. The basis for splicing alterations and the functional importance of cancer-associated spliced products remain largely unexplored. The scope of this work aims to better understand the basis for cancer-associated splicing alterations and their functional importance.
We first focus on establishing the genetic basis for cancer-associated splicing alterations. As part of the Pan Cancer Analysis of Whole Genomes (PCAWG) consortium, we demonstrate the impact of non-coding intronic mutations by using matched whole-genome and RNA-sequencing data across 1,209 primary tumor samples spanning 27 cancer types. We identify intronic sites beyond canonical acceptor and donor dinucleotides that are sensitive to mutations, including the branchpoint consensus sequences, which is typically missed in exome sequencing based tumor genotyping. We identify tumor suppressor genes and oncogenes with intronic mutations associated with substantial changes in splicing, and identify previously described alterations in the oncogene EZH2, as well as uncharacterized changes in oncogenes MET and HRAS. Altogether, this work provides the first estimates of the extent to which intronic mutations missed by exome-based genotyping contribute to splicing changes in cancer.
The second half of my work reveals the fate and function of spliced products associated with lung adenocarcinoma mutations in the splicing factor U2AF1. We conduct high-throughput long-read cDNA sequencing in isogenic human bronchial epithelial cells with and without U2AF1 S34F mutation. We demonstrate the utility of our long-read approach for transcriptome studies by identifying 49,366 novel isoforms exclusive to our approach. We show that our long-read data is robust for capturing mutant U2AF1-associated transcriptome alterations by comparing event-level alternative splicing changes with a short-read approach. We identify isoform-level expression changes in 198 isoforms, including a novel lncRNA, and immune-related genes. Last, we hypothesize a mechanism by which U2AF1 S34F alters translational control of genes through modulating isoform diversity.