Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Previously Published Works bannerUC Riverside

A Metagenomic Analysis of Environmental and Clinical Samples Using a Secure Hybrid Cloud Solution.

Abstract

The number and types of studies about the human microbiome, metagenomics and personalized medicine, and clinical genomics are increasing at an unprecedented rate, leading to computational challenges. For example, the analysis of patient/clinical samples requires methods capable of (i) accurately detecting pathogenic organisms, (ii) running with high speed to allow short response-time and diagnosis, and (iii) scaling to ever growing databases of reference genomes. While cloud-computing has the potential to offer low-cost solutions to these needs, serious concerns regarding the protection of genomic data exist due to the lack of control and security in remote genomic databases. We present a novel metagenomic analysis system called "Virgile" that is capable of performing privacy-preserving queries on databases hosted in outsourced servers (e.g., public or cloud-based). This method takes as input the sequenced data produced by any modern sequencing instruments (e.g., Illumina, Pacbio, Oxford Nanopore) and outputs the microbial profile using a database of whole genome sequences (e.g., the RefSeq database from NCBI). The algorithm for the microbial profile aims to estimate without bias the abundance of microorganisms present using a genome-centric approach. Result: Using an extensive set of 65 simulated datasets, negative and positive controls, real clinical samples, and mock communities, we show that Virgile identifies and estimates the abundance of organisms present in environmental or clinical samples with high accuracy compared to state-of-the-art and popular methods available, including MetaPhlAn2 and KrakenUniq. Running at high speed, Virgile can also be run on a standard 8 GB RAM laptop. Virgile is a novel privacy-preserving abundance estimation algorithm called Virgile that can efficiently and rapidly discern the abundance and taxonomic identification of organisms present in a metagenomic sample, including those from medical environments. To the best of our knowledge, Virgile is the only metagenome analysis system leveraging cloud computing in a secure manner.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View