- Paez-Espino, David;
- Chen, I-Min A;
- Palaniappan, Krishna;
- Ratner, Anna;
- Chu, Ken;
- Szeto, Ernest;
- Pillay, Manoj;
- Huang, Jinghua;
- Markowitz, Victor M;
- Nielsen, Torben;
- Huntemann, Marcel;
- Reddy, TBK;
- Pavlopoulos, Georgios A;
- Sullivan, Matthew B;
- Campbell, Barbara J;
- Chen, Feng;
- McMahon, Katherine;
- Hallam, Steve J;
- Denef, Vincent;
- Cavicchioli, Ricardo;
- Caffrey, Sean M;
- Streit, Wolfgang R;
- Webster, John;
- Handley, Kim M;
- Salekdeh, Ghasem H;
- Tsesmetzis, Nicolas;
- Setubal, Joao C;
- Pope, Phillip B;
- Liu, Wen-Tso;
- Rivers, Adam R;
- Ivanova, Natalia N;
- Kyrpides, Nikos C
Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from >6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs are grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparing with external sequences, thus serving as an essential resource in the viral genomics community.