Skip to main content
eScholarship
Open Access Publications from the University of California

Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes.

  • Author(s): Shafin, Kishwar
  • Pesout, Trevor
  • Lorig-Roach, Ryan
  • Haukness, Marina
  • Olsen, Hugh E
  • Bosworth, Colleen
  • Armstrong, Joel
  • Tigyi, Kristof
  • Maurer, Nicholas
  • Koren, Sergey
  • Sedlazeck, Fritz J
  • Marschall, Tobias
  • Mayes, Simon
  • Costa, Vania
  • Zook, Justin M
  • Liu, Kelvin J
  • Kilburn, Duncan
  • Sorensen, Melanie
  • Munson, Katy M
  • Vollger, Mitchell R
  • Monlong, Jean
  • Garrison, Erik
  • Eichler, Evan E
  • Salama, Sofie
  • Haussler, David
  • Green, Richard E
  • Akeson, Mark
  • Phillippy, Adam
  • Miga, Karen H
  • Carnevali, Paolo
  • Jain, Miten
  • Paten, Benedict
  • et al.
Abstract

De novo assembly of a human genome using nanopore long-read sequences has been reported, but it used more than 150,000 CPU hours and weeks of wall-clock time. To enable rapid human genome assembly, we present Shasta, a de novo long-read assembler, and polishing algorithms named MarginPolish and HELEN. Using a single PromethION nanopore sequencer and our toolkit, we assembled 11 highly contiguous human genomes de novo in 9 d. We achieved roughly 63× coverage, 42-kb read N50 values and 6.5× coverage in reads >100 kb using three flow cells per sample. Shasta produced a complete haploid human genome assembly in under 6 h on a single commercial compute node. MarginPolish and HELEN polished haploid assemblies to more than 99.9% identity (Phred quality score QV = 30) with nanopore reads alone. Addition of proximity-ligation sequencing enabled near chromosome-level scaffolds for all 11 genomes. We compare our assembly performance to existing methods for diploid, haploid and trio-binned human samples and report superior accuracy and speed.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
Current View