InterMine Webservices for Phytozome (Rev2)
A datawarehousing framework for information provides a useful infrastructure for providers and users of genomic data. For providers, the infrastructure give them a consistent mechanism for extracting raw data. While for the users, the web services supported by the software allows them to make complex, and often unique, queries of the data. Previously, phytozome.net used BioMart to provide the infrastructure. As the complexity, scale and diversity of the dataset as grown, we decided to implement an InterMine web service on our servers. This change was largely motivated by the ability to have a more complex table structure and richer web reporting mechanism than BioMart. For InterMine to achieve its more complex database schema it requires an XML description of the data and an appropriate loader. Unlimited one-to-many and many-to-many relationship between the tables can be enabled in the schema.We have implemented support for:1.) Genomes and annotations for the data in Phytozome. This set is the 48 organisms currently stored in a back end CHADO datastore. The data loaders are modified versions of the CHADO data adapters from FlyMine. 2.) Interproscan results from all proteins in the Phytozome database. 3.) Clusters of proteins into a grouped heirarchically by similarity. 4.) Cufflinks results from tissue-specific RNA-Seq data of Phytozome organisms. 5.) Diversity data (GATK and SnpEFF results) from a set of individual organism. The last two datatypes are new in this implementation of our web services. We anticipate that the scale of these data will increase considerably in the near future.