Skip to main content
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Comparative Analysis of Long-Read Transcriptome Assembly Pipelines

Creative Commons 'BY' version 4.0 license

Long-read sequencing can overcome some of the barriers in transcriptome assembly

that plague short-read based technologies. Due to their short length, short-reads fail to

span entire transcripts, and this leads to difficulties in discerning proper splice

junctions. Conversely, long-read sequencing can span entire transcripts end-to-end,

and thus can circumvent issues in inferring splice junctions. Multiple long-read

transcriptome assembly pipelines have been developed in recent years but there is no

comprehensive analysis comparing the various pipelines. Some of these pipelines

implement novel approaches to generating transcriptomes using long-reads, while

other pipelines adapted methods originally developed for short-read based

transcriptome assembly. We show that there are significant differences in

transcriptomes assembled on the same data, using different assembly pipelines. Our

analysis further shows that high-level summary statistics can be misleading about

transcriptome quality, as well as the importance of using internalized controls to

validate transcriptome assemblies.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View