- Rusch, DB;
- Halpern, AL;
- Sutton, G;
- Heidelberg, KB;
- Williamson, S;
- Yooseph, S;
- Wu, D;
- Eisen, JA;
- Hoffman, JM;
- Remington, K;
- Beeson, K;
- Tran, B;
- Smith, H;
- Baden-Tillson, H;
- Stewart, C;
- Thorpe, J;
- Freeman, J;
- Andrews-Pfannkoch, C;
- Venter, JE;
- Li, K;
- Kravitz, S;
- Heidelberg, JF;
- Utterback, T;
- Rogers, YH;
- Falcón, LI;
- Souza, V;
- Bonilla-Rosso, G;
- Eguiarte, LE;
- Karl, DM;
- Sathyendranath, S;
- Platt, T;
- Bermingham, E;
- Gallardo, V;
- Tamayo-Castillo, G;
- Ferrari, MR;
- Strausberg, RL;
- Nealson, K;
- Friedman, R;
- Frazier, M;
- Venter, JC
The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed "fragment recruitment," addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed "extreme assembly," made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.