- Austin, George;
- Park, Heekuk;
- Meydan, Yoli;
- Seeram, Dwayne;
- Sezin, Tanya;
- Lou, Yue;
- Firek, Brian;
- Morowitz, Michael;
- Banfield, Jill;
- Christiano, Angela;
- Peer, Itsik;
- Uhlemann, Anne-Catrin;
- Shenhav, Liat;
- Korem, Tal
Sequencing-based approaches for the analysis of microbial communities are susceptible to contamination, which could mask biological signals or generate artifactual ones. Methods for in silico decontamination using controls are routinely used, but do not make optimal use of information shared across samples and cannot handle taxa that only partially originate in contamination or leakage of biological material into controls. Here we present Source tracking for Contamination Removal in microBiomes (SCRuB), a probabilistic in silico decontamination method that incorporates shared information across multiple samples and controls to precisely identify and remove contamination. We validate the accuracy of SCRuB in multiple data-driven simulations and experiments, including induced contamination, and demonstrate that it outperforms state-of-the-art methods by an average of 15-20 times. We showcase the robustness of SCRuB across multiple ecosystems, data types and sequencing depths. Demonstrating its applicability to microbiome research, SCRuB facilitates improved predictions of host phenotypes, most notably the prediction of treatment response in melanoma patients using decontaminated tumor microbiome data.