Li, Wei

RNA-Seq Based Transcriptome Assembly: Sparsity, Bias Correction and Multiple Sample Comparison

2012

Li, Wei
Advisor(s): Jiang, Tao

Abstract

RNA-Seq, or deep-sequencing of RNAs, is a new technology for transcriptome profiling using second generation sequencing. RNA-Seq has been widely used to identify and quantify transcriptomes at an unprecedented high resolution and low cost. An important computational problem arising from RNA-Seq is transcriptome assembly, in which the structures of transcripts (and their expression levels) are inferred simultaneously from RNA-Seq data. RNA-Seq transcriptome assembly allows for the detection of structural and quantitative changes of transcripts between samples, paving the way for novel biological discoveries. However, the problem of RNA-Seq transcriptome assembly is challenging because: (i) the complicated alternative splicing patterns of some genes result in a huge number of possible transcripts, (ii) different kinds of biases in RNA-Seq reads (including sequencing, positional and mappability biases) decrease the accuracy of assembly and expression level estimation algorithms, and (iii) the existing assembly tools can only reconstruct transcripts from a single sample, leading to a high false positive rate for comparing RNA-Seq experiments from multiple samples.

We propose three different algorithms to address these challenges. First, we design a transcriptome assembly tool, IsoLasso, that balances different objectives (prediction accuracy, sparsity, interpretation) and takes advantage of the sparsity of expressed transcripts. Second, we use the quasi-multinomial distribution to model the RNA-Seq biases, and design a new algorithm, CEM, to handle different biases in both transcriptome assembly and transcript expression level estimation. Finally, we propose a multiple-sample transcriptome assembly tool, ISP, to assemble transcripts directly from RNA-Seq data of multiple samples. ISP reaches an improved performance compared to the assembly tools that consider one sample at a time, and helps to improve the accuracy of downstream differential analysis of transcriptomes between samples.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Riverside

RNA-Seq Based Transcriptome Assembly: Sparsity, Bias Correction and Multiple Sample Comparison