Skip to main content
eScholarship
Open Access Publications from the University of California

A unified sequence catalogue of over 280,000 genomes obtained from the human gut microbiome

  • Author(s): Almeida, Alexandre;
  • Nayfach, Stephen;
  • Boland, Miguel;
  • Strozzi, Francesco;
  • Beracochea, Martin;
  • Shi, Zhou Jason;
  • Pollard, Katherine;
  • Parks, Donovan;
  • Hugenholtz, Philip;
  • Segata, Nicola;
  • Kyrpides, Nikos;
  • Finn, Robert
  • et al.

Published Web Location

https://www.biorxiv.org/content/10.1101/762682v1
No data is associated with this publication.
Abstract

Comprehensive reference data is essential for accurate taxonomic and functional characterization of the human gut microbiome. Here we present the Unified Human Gastrointestinal Genome (UHGG) collection, a resource combining 286,997 genomes representing 4,644 prokaryotic species from the human gut. These genomes contain over 625 million protein sequences used to generate the Unified Human Gastrointestinal Protein (UHGP) catalogue, a collection that more than doubles the number of gut protein clusters over the Integrated Gene Catalogue. We find that a large portion of the human gut microbiome remains to be fully explored, with over 70% of the UHGG species lacking cultured representatives, and 40% of the UHGP missing meaningful functional annotations. Intra-species genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which were specific to individual human populations. These freely available genomic resources should greatly facilitate investigations into the human gut microbiome.

Item not freely available? Link broken?
Report a problem accessing this item