Lawrence Berkeley National Laboratory
VIMSS Computational Core
- Author(s): Arkin, Adam P.
- et al.
The VIMSS Computational Core group is responsible for data management, data integration, data analysis, and comparative and evolutionary genomic analysis of the data for the VIMSS project. We have expanded and extended our existing tools sets for comparative and functional genomics to deal with new data produced by the VIMSS ESPP2 members. The Computational Core is developing methods to store and analyze diverse data sets including: microarrays, ChIP-chip arrays, tiling arrays, proteomics, metabolomics, metabolic flux, phylochips, metagenomics sequencing, genome sequencing, growth curves, phenotype arrays, knock out strain collections and links to existing literature and web based resources. Our analysis has been incorporated into our comparative and functional genomics website MicrobesOnline (http://www.microbesonline.org) and made available to the wider research community. By taking advantage of data integration across diverse functional and comparative datasets, we have been able to pursue large research projects in evolutionary and systems biology studies. Data management, integration and distribution are critical functions for all large projects. A primary goal of the Computational Core is to capture all experimental data from the ESPP2 investigators, including relevant metadata, raw data and processed data, and to make these data sets available through intuitive queries. Our group has developed Experimental Information and Data Repository (http://vimss.lbl.gov/EIDR/) and the MicrobesOnline database to provide this functionality. Researches have access to datasets from biomass production, growth curves, image data, mass spec data, phenotype microarray data and transcriptomic, proteomic and metabolomic data. New functionality has been added for storage of information relating to mutant strains, transposon knockout libraries and protein complex data, in addition to new visualization for assessing existing data sets such as the phenotype microarrays. The Computational Core has focused on using the data generated by the ESPP2 project to understand the stress response of Desulfovibrio vulgaris Hildenbourgh. We work closely with the other core groups within the ESPP2 project to assist in data analysis. Over the past year, this has included co-culture laboratory evolution, 16S barcode data, and phenotype analysis for the Applied Environmental Microbiology Core (AEMC) and transcriptomic, tiling arrays, and metabolite analysis for the Functional Genomics Core (FGC). New research being pursued this year by the Computational Core includes: development of new methods for data compendium analysis using biclustering which combines transcriptomic, proteomic, interaction and gene neighborhood data in order to predicted regulatory structures; computational predictions of amino acid synthesis pathways in DvH (working closely with the FGC to verify predictions); evolutionary analysis of lineage specific gene expansion across bacteria; sub N squared phylogenetic tree reconstruction by developing the FastTree program; and large scale comparative metagenomic sequence analysis. The MicrobesOnline database (http://www.microbesonline.org) currently holds over 1000 microbial genomes and will be updated semi-annually, providing an important comparative and functional genomics resource to the community. New functionality added this year includes the addition of fungal genomes and the framework for adding additional eukaryotic genomes, an updated user interface for the phylogenetic tree based genome browser that allows users to view their genes and genomes of interest within an evolutionary framework, improved tools to compare multiple microarray expression data across genes and genomes, phylogenetic profile searches using our high quality species tree, and addition of external microarray data from the Many Microbial Microarrays Database for bacteria and Yea