Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Previously Published Works bannerUC Irvine

TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts.

Abstract

MOTIVATION: Long-read, single-molecule sequencing platforms hold great potential for isoform discovery and characterization of multi-exon transcripts. However, their high error rates are an obstacle to distinguishing novel transcript isoforms from sequencing artifacts. Therefore, we developed the package TranscriptClean to correct mismatches, microindels and noncanonical splice junctions in mapped transcripts using the reference genome while preserving known variants. RESULTS: Our method corrects nearly all mismatches and indels present in a publically available human PacBio Iso-seq dataset, and rescues 39% of noncanonical splice junctions. AVAILABILITY AND IMPLEMENTATION: All Python and R scripts used in this paper are available at https://github.com/dewyman/TranscriptClean.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View