Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Previously Published Works bannerUC Berkeley

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools.

Abstract

Background

Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference.

Description

We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank.

Conclusions

The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View