Lawrence Berkeley National Laboratory
ESPP Computational Core Overview
- Author(s): Dehal, Paramvir S.
- et al.
Background: The VIMSS Computational Core group is responsible for data management, data integration, data analysis, and comparative and evolutionary genomic analysis of the data for the VIMSS project. We have expanded and extended our existing tools sets for comparative and evolutionary genomics and microarray analysis as well as creating new tools for our proteomic and metabolomic data sets. Our analysis has been incorporated into our comparative genomics website MicrobesOnline (http://www.microbesonline.org) and made available to the wider research community. By taking advantage of the diverse functional and comparative datasets, we have been able to pursue large evolutionary studies. Data Analysis: During the course of analysis of various stress responses of DvH, the computational core has continued to develop new statistical analyses of data that take advantage of the predicted regulatory structures (operons, regulons, etc.) from our comparative analyses. This year we have used these analyses to investigate the response of DvH to oxygen stress and pH stress. Our analysis has focused on the combined results from both transcriptomic and proteomic datasets to interpret oxygen stress. Additionally, we have worked with metabolomic datasets within the framework of predicted metabolic activities to find missing pathway members. Data Management: All data generated by ESPP continues to be stored in our Experimental Information and Data Repository (http://vimss.lbl.gov/EIDR/). Researches have access to datasets from biomass production, growth curves, image data, mass spec data, phenotype microarray data and transcriptomic, proteomic and metabolomic data. New functionality has been added for storage of information relating to mutants and protein complex data, in addition to new visualization for assessing existing data sets such as the phenotype microarrays. The MicrobesOnline Database: The MicrobesOnline database (http://www.microbesonline.org) currently holds over 700 microbial genomes and will be updated quarterly, providing an important comparative genomics resource to the community. New functionality added this year includes the addition of a thousands of phage genomes and plasmids, an updated user interface for the phylogenetic tree based genome browser that allows users to view their genes and genomes of interest within an evolutionary framework, tools to compare multiple microarray expression data across genes and genomes, addition of external microarray data from the Many Microbial Microarrays Database, integration with the RegTransBase of experimentally verified regulatory binding sites and links to three dimensional protein structures of proteins and their close relatives. MicrobesOnline continues to provide an interface for genome annotation, which like all the tools reported here, is freely available to the scientific community. To keep up with the rapidly expanding set of sequenced genomes, we have begun to investigate methods for accelerating our annotation pipeline. In particular we have completed work on methods to speed up the most time consuming process, homology searching through HMM alignments and all against all BLAST. These methods now enable us to deal with the many millions of gene sequences generated from metagenomics. Over the next year, several new features will be added to the MicrobesOnline resource. Microarray expression data will be added from the NCBI GEO database, in addition to datasets generated from the VIMSS team. To supplement the analysis tools we already have, enrichment of functional genes and operon-wise analysis, we will provide tools for comparing multiple experiments across multiple genomes. We will also expand our regulatory binding motif search to incorporate co-expression data to support predictions. Evolutionary Analysis: The computational core continues work on understanding the evolution of regulatory networks. Transcription factors form large para