Function and Regulation of Nucleotide Variants in RNA
RNA molecules harbor the information necessary for the synthesis of proteins and are essential to a wide variety of cellular processes. Variation of the RNA sequences results in significant phenotypic differences; however, the precise relationship between the two remains largely unknown. Thanks to the advent of high-throughput sequencing technologies, we now have the opportunity to study the transcriptome with unprecedented detail and characterize many different types of variants present in the RNA. In the present work, we developed novel computational approaches and performed in-depth analysis of RNA-sequencing (RNA-seq) data with the overarching goal of studying the function and regulations of nucleotide variants in RNA. We first aimed to understand the factors that regulate the most prevalent type of non-genetic nucleotide variant in human RNAs which is Adenosine-to-Inosine (A-to-I) editing. We analyzed bulk RNA-seq data obtained following the knockdown of over two hundred RNA Binding Proteins (RBPs) individually. This allowed us to study their role in the regulation of A-to-I editing at the transcriptome-wide scale. We identified several RBPs including DROSHA, ILF2/3, TROVE2, and TARDBP that significantly alter editing levels through various mechanisms including directly targeting the expression of ADAR1, protein-protein interaction, and direct binding to edited regions. Next, to study the effect of nucleotide variants, we made use of single-cell RNA-sequencing (scRNA-seq) data. This technology offers a unique glimpse of the transcriptome at the single cell-resolution. However, identification of nucleotide variants in scRNA-seq remains challenging and very few methods are available for this purpose. Here, we present scAllele, a novel method that detects both single nucleotide variants (SNVs) and microindels in scRNA-seq with high accuracy and sensitivity. In addition, scAllele identifies functional relationships between the identified variants and alternative RNA processing. We applied scAllele to scRNA-seq data derived from lung cancer patients (matched tumor and normal) and detected over 150 allele-specific splicing events that were unique to each condition or showed differential prevalence. Based on scAllele, we further developed a new method, namely T-Allele, to identify nucleotide variants and their linkage patterns in third-generation RNA-seq data. We demonstrated that the precision of variant calls by T-Allele is robust despite the relatively high sequencing error rate of this type of data. Using T-Allele, we identified up to 44 haplotype-specific alternative splicing events in each of the 8 cell lines included in our study. We also showed T-allele’s ability to segregate alternative splicing events regulated genetically from those whose regulation involved other factors.