Alternative splicing plays a crucial role in increasing the amount of protein diversity and in regulating gene expression at the post-transcriptional level. In humans, almost all genes produce more than one mRNA isoform and, while the fraction varies, many other species also have a substantial number of alternatively spliced genes. Alternative splicing is regulated by splicing factors, often in a developmental time- or tissue-specific manner. Mis-regulation of alternative splicing, via mutations in splice sites, splicing regulatory elements, or splicing factors, can lead to disease states, including cancers. Thus, characterizing how alternative splicing shapes the transcriptome will lead to greater insights into the regulation of numerous cellular pathways and many aspects of human health.
A critical tool for investigating alternative splicing is high-throughput mRNA sequencing (RNA-seq). This technology produces hundreds of millions of short (~100bp) sequencing reads from mRNA molecules and can be used to both discover novel transcripts and to quantify the expression of transcripts. While short read length is a limitation of the technology in its current form, RNA-seq has resulted in the discovery of hundreds of thousands of new transcripts and revealed an increased complexity of the transcriptome via alternative splicing, particularly in human. Here, I used RNA-seq analysis to investigate the global effect of post-transcriptional regulation via alternative splicing coupled to nonsense-mediated mRNA decay and to examine natural human variation in alternative splicing, particularly in genes associated with differential therapeutic drug response.
The nonsense-mediated mRNA decay pathway (NMD), which degrades transcripts containing a premature termination codon, plays an important role in post-transcriptional gene regulation when coupled to alternative splicing. If a gene produces an alternative isoform that is targeted by NMD, the mRNA abundance of the protein-producing transcripts can be post-transcriptionally regulated at the alternative splicing level. This has been shown to be important in the regulation of a number of genes, including many of the splicing factors themselves. I have used RNA-seq analysis on cells where NMD has been inhibited to discover alternative isoforms that are NMD targets on a genome-wide scale in human and a number of diverse other eukaryotic species. I found that around 20% of expressed human genes are potentially regulated by alternative splicing coupled to NMD and that they fall into many different functional categories. I also found that hundreds to thousands of genes produce NMD-targeted alternative isoforms in each of frog, zebrafish, fly, fission yeast, and plant, highlighting the prevalence of this relatively under-studied method of gene regulation across the three major branches of eukaryotic organisms. I also gained insight into the features that define NMD targets, which are thought to vary between species although the field is still unclear. I find that an exon-exon junction downstream of the termination codon is a much stronger predictor of NMD than 3’ UTR length in every species except yeast.
I also used RNA-seq to investigate alternative splicing in genes of pharmacologic importance. Natural human variation in the expression level and activity of genes involved in drug disposition and action (“pharmacogenes”) can affect drug response and toxicity. Previous studies have relied primarily on microarrays to understand gene expression differences, or have focused on a single tissue or small number of samples. Here, we used RNA-seq to determine the expression levels and alternative splicing of 389 selected pharmacogenes across four pharmacologically relevant tissues (liver, kidney, heart and adipose) and lymphoblastoid cell lines (LCLs), which are used widely in pharmacogenomics studies. Analysis of data from 18 different individuals for each of the 5 tissues (90 samples in total) revealed substantial variation in both expression levels and splicing across samples and tissue types. Comparison with an independent RNA-seq dataset yielded a consistent picture. This in-depth exploration also revealed 183 splicing events in pharmacogenes that were previously not annotated. Overall, this study serves as a rich resource for the research community to inform biomarker and drug discovery and use.
In conclusion, the roles of alternative splicing and NMD in the regulation of cellular processes and in human health are wide-open but critical fields of study. Advancements in sequencing technologies have had and will continue to have a huge impact on the studies of these mechanisms. New long-read technologies will likely soon be readily available and promise to greatly increase our ability to accurately interpret RNA-seq results. As the cost of sequencing continues to decrease, more and more data will be generated, allowing for a better view of how the transcriptome varies between individuals and shapes differential disease risks and drug responses.