Skip to main content
eScholarship
Open Access Publications from the University of California

Hybrid error correction and de novo assembly of single-molecule sequencing reads

  • Author(s): Koren, S
  • Schatz, MC
  • Walenz, BP
  • Martin, J
  • Howard, JT
  • Ganapathy, G
  • Wang, Z
  • Rasko, DA
  • McCombie, WR
  • Jarvis, ED
  • Phillippy, AM
  • et al.

Published Web Location

https://doi.org/10.1038/nbt.2280
Abstract

Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly. © 2012 Nature America, Inc. All rights reserved.

Main Content
Current View