- Main
Clustering of mRNA-Seq Data for Detection of Alternative Splicing Patterns
- Johnson, Marla
- Advisor(s): Purdom, Elizabeth
Abstract
Whereas prior methods of studying expression in a cell returned only estimates of gene expression, sequencing of mRNA can provide estimates of the amount of individual isoforms within the cell. As a result, many standard statistical methods commonly used for analyzing gene expression levels need to be modified in order to take advantage of this additional information. Many methods have been developed to study differential isoform expression between known groups but little research has been done utilizing methods of unsupervised learning, such as clustering. One novel question is whether we can find clusters of samples that are distinguishable not by their gene expression but by their isoform usage. That is, instead of using clustering to find groups with shared changes in gene expression, we want to utilize clustering to find groups with shared changes in isoform usage. Here, we propose a novel approach to clustering mRNA-Seq data that identifies such clusters. In order to utilize both gene and isoform information when clustering, we treat the sequencing data as a vector denoting the relative isoform usage of each isoform in a gene. In simulated data, we show that clustering using relative isoform usage values rather than isoform counts is more sensitive to finding clusters based on changes in isoform usage. In a real data set, we demonstrate its performance in finding a technical artifact that resulted in different batches having different isoform usage patterns. Additionally, we also illustrate its usage on several TCGA data sets. Specifically, we looked at whether groups determined from clustering on relative isoform usage were associated with tumor stage or splicing mutations.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-