UC San Diego
Genetic regulation of RNA splicing and expression in cancer and stem cells
- Author(s): DeBoever, Christopher
- Advisor(s): Frazer, Kelly A
- et al.
A central question in genetics is how different classes of DNA variants affect RNA splicing and expression. While there has been substantial progress in associating single nucleotide polymorphisms and small indels with these phenotypes, only recently has affordable high throughput sequencing provided the opportunity to assess the impact of somatic, rare, and copy number variants (CNVs) on RNA splicing and expression. In this thesis, I use high throughput sequencing to investigate the effect of somatic variants in SF3B1 on RNA splicing and characterize the genetic regulation of gene expression in induced pluripotent stem cells (iPSCs). In the first part, I examine the effect of recurrent somatic mutations in the splicing factor SF3B1 on RNA splicing in three different cancer types and find that SF3B1 mutants use hundreds of cryptic 3’ splice sites that are rarely used in samples without SF3B1 mutations. Sequence properties of these cryptic 3’ splice sites suggest altered sterics may allow usage of cryptic 3’ splice sites in SF3B1 mutants. I also identify several candidate genes with out-of-frame cryptic splice sites that are used in a majority of transcripts in the mutants and may contribute to oncogenesis. In the second part, I examine the genetic regulation of gene expression in a collection of 215 human iPSCs using transcriptome and whole genome sequencing. I identify expression quantitative trait loci (eQTLs) for nearly six thousand genes including markers of pluripotency such as POU5F1, LCK, IDO1, and CXCL5. A comparison to GTEx eQTLs reveals that iPSCs are well powered statistically for finding eQTLs and have a unique regulatory landscape. I identify biallelic and multiallelic CNVs eQTLs and find that a substantial proportion of CNV eQTLs appear to affect intergenic regulatory regions. I also find that rare promoter variants weakly disrupt gene expression while rare CNVs that overlap genes tend to disrupt gene expression with relatively high effect sizes. Overall, this thesis helps define the roles of somatic, rare, and copy number variants in the regulation of gene expression and splicing and provide key insights into SF3B1-mutated cancers and iPSCs as a model system for molecular association analyses.