High-Throughput Experimental and Computational Techniques for Assessing Genetic Variants in Microexon Splicing
- Burghard, Christina
- Advisor(s): Xiao, Xinshu
Abstract
The identification of causal genetic variants underlying human diseases and traits remains a major challenge in genomics. New progress towards this goal has been made possible by advancing high-throughput technologies in biology and the subsequent collection of large biological datasets. In my thesis work I use both large-scale experimental and computational approaches to identify genetic variants that alter the process of RNA splicing. I also identified a unique opportunity to utilize large-scale RNA-Seq data for variant discovery, particularly focusing on an understudied class of splicing events known as microexons. Microexons are extremely short exons that pose unique challenges for quantification. To address this, I developed an optimized computational pipeline described in Chapter 2. This pipeline was rigorously validated against long-read sequencing data to ensure its reliability. In Chapter 3, the functional importance of microexons is explored through analysis of multi-tissue RNA-Seq data sourced from the GTEx consortium. My analysis identified thousands of highly conserved microexons. These microexons align well with recent evidence indicating that disruption in microexon splicing is associated with neurological and muscular diseases. Finally in chapter 4, I expand on work using a massively parallel reporter assay (MPRA) to directly test thousands of sequence variants for their effect on splicing. Through a data science-driven approach, this thesis advances our understanding of the intricate landscape of splice-disrupting variants, producing new tools as well as novel insights into the functional importance of microexons in human biology and disease.