Skip to main content
eScholarship
Open Access Publications from the University of California

Parallel de Bruijn Graph Construction and Traversal for de Novo Genome Assembly

  • Author(s): Georganas, E;
  • Buluç, A;
  • Chapman, J;
  • Oliker, L;
  • Rokhsar, D;
  • Yelick, K
  • Editor(s): Damkroger, Trish;
  • Dongarra, Jack J
  • et al.

Published Web Location

https://crd.lbl.gov/assets/pubs_presos/sc14genome.pdf
No data is associated with this publication.
Abstract

De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous fragments called reads. We study optimized parallelization of the most time-consuming phases of Meraculous, a state of-the-art production assembler. First, we present a new parallel algorithm for k-mer analysis, characterized by intensive communication and I/O requirements, and reduce the memory requirements by 6.93×. Second, we efficiently parallelize de Bruijn graph construction and traversal, which necessitates a distributed hash table and is a key component of most de novo assemblers. We provide a novel algorithm that leverages one-sided communication capabilities of the Unified Parallel C (UPC) to facilitate the requisite fine-grained parallelism and avoidance of data hazards, while analytically proving its scalability properties. Overall results show unprecedented performance and efficient scaling on up to 15,360 cores of a Cray XC30, on human genome as well as the challenging wheat genome, with performance improvement from days to seconds.

Item not freely available? Link broken?
Report a problem accessing this item