Skip to main content
eScholarship
Open Access Publications from the University of California

CheckV: assessing the quality of metagenome-assembled viral genomes

  • Author(s): Nayfach, Stephen
  • Camargo, Antonio Pedro
  • Eloe-Fadrosh, Emiley
  • Roux, Simon
  • Kyrpides, Nikos
  • et al.

Published Web Location

https://www.biorxiv.org/content/10.1101/2020.05.06.081778v1
No data is associated with this publication.
Abstract

Abstract Over the last several years, metagenomics has enabled the assembly of millions of new viral sequences that have vastly expanded our knowledge of Earth’s viral diversity. However, these sequences range from small fragments to complete genomes and no tools currently exist for estimating their quality. To address this problem, we developed CheckV, which is an automated pipeline for estimating the completeness of viral genomes as well as the identification and removal of non-viral regions found on integrated proviruses. After validating the approach on mock datasets, CheckV was applied to large and diverse viral genome collections, including IMG/VR and the Global Ocean Virome, revealing that the majority of viral sequences were small fragments, with just 3.6% classified as high-quality (i.e. > 90% completeness) or complete genomes. Additionally, we found that removal of host contamination significantly improved identification of auxiliary metabolic genes and interpretation of viral-encoded functions. We expect CheckV will be broadly useful for all researchers studying and reporting viral genomes assembled from metagenomes. CheckV is freely available at: http://bitbucket.org/berkeleylab/CheckV .

Item not freely available? Link broken?
Report a problem accessing this item