Skip to main content
eScholarship
Open Access Publications from the University of California

Deconvolute individual genomes from metagenome sequences through read clustering

Published Web Location

https://peerj.com/articles/8966/
No data is associated with this publication.
Abstract

ABSTRACT Motivation Metagenome assembly from short next-generation sequencing data is a challenging process due to its large scale and computational complexity. Clustering short reads before assembly offers a unique opportunity for parallel downstream assembly of genomes with individualized optimization. However, current read clustering methods suffer either false negative (under-clustering) or false positive (over-clustering) problems. Results Based on a previously developed scalable read clustering method on Apache Spark, SpaRC, that has very low false positives, here we extended its capability by adding a new method to further cluster small clusters. This method exploits statistics derived from multiple samples in a dataset to reduce the under-clustering problem. Using a synthetic dataset from mouse gut microbiomes we show that this method has the potential to cluster almost all of the reads from genomes with sufficient sequencing coverage. We also explored several clustering parameters that deferentially affect genomes with various sequencing coverage. Availability https://bitbucket.org/berkeleylab/jgi-sparc/ . Contact zhongwang@lbl.gov

Item not freely available? Link broken?
Report a problem accessing this item