MotivationThere is recent interest in using gene expression data to contextualize findings from traditional genome-wide association studies (GWAS). Conditioned on a tissue, expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated with genetic variants. eQTLs and eGenes provide great supporting evidence for GWAS hits and important insights into the regulatory pathways involved in many diseases. When a significant variant or a candidate gene identified by GWAS is also an eQTL or eGene, there is strong evidence to further study this variant or gene. Multi-tissue gene expression datasets like the Gene Tissue Expression (GTEx) data are used to find eQTLs and eGenes. Unfortunately, these datasets often have small sample sizes in some tissues. For this reason, there have been many meta-analysis methods designed to combine gene expression data across many tissues to increase power for finding eQTLs and eGenes. However, these existing techniques are not scalable to datasets containing many tissues, like the GTEx data. Furthermore, these methods ignore a biological insight that the same variant may be associated with the same gene across similar tissues.
ResultsWe introduce a meta-analysis model that addresses these problems in existing methods. We focus on the problem of finding eGenes in gene expression data from many tissues, and show that our model is better than other types of meta-analyses.
Availability and implementationSource code is at https://github.com/datduong/RECOV .
Contacteeskin@cs.ucla.edu or email@example.com.
Supplementary informationSupplementary data are available at Bioinformatics online.