Using RNA Sequencing Data to Detect Variants of Interest
The primary function for RNA sequencing (RNA-seq) is to investigate the transcriptome through differential gene expression. For cancer and other genetic diseases, detecting variants in the genome is critical for our understanding of how these diseases begin and progress. Here, I will present computational methods focused on using RNA-seq to detect disease-associated variants. We developed RNA-VACAY, a containerized high-throughput pipeline that automates somatic variant calling in RNA-seq data. We analyzed 1,349 RNA-seq samples from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Project and found that RNA-VACAY can accurately identify somatic variants of interest using tumor RNA-seq, alone. Our pipeline also does not require a matched normal sample to detect somatic variants, which is commonly unavailable in research or clinical settings. RNA-VACAY can also successfully identify 5’ and 3’ UTR variants, which are overlooked when using WES data. Additionally, we analyzed RNA-seq data to characterize splicing variants. We found a splice site variant associated with a previously detected variant of uncertain significance in a patient with an undiagnosed genetic disorder. We also developed a computational method for efficiently designing guide RNAs for a CRISPR/Cas9 screen to detect exon skipping events associated with tumor formation. Our work demonstrates the impact of RNA-seq for detecting functional variants in genetic diseases.