Predicting Tissue-Specific Enhancers in the Human Genome
Determining how transcriptional regulatory signals are encoded in vertebrate genomes is essential for understanding the origins of multi-cellular complexity; yet the genetic code of vertebrate gene regulation remains poorly understood. In an attempt to elucidate this code, we synergistically combined genome-wide gene expression profiling, vertebrate genome comparisons, and transcription factor binding site analysis to define sequence signatures characteristic of candidate tissue-specific enhancers in the human genome. We applied this strategy to microarray-based gene expression profiles from 79 human tissues and identified 7,187 candidate enhancers that defined their flanking gene expression, the majority of which were located outside of known promoters. We cross-validated this method for its ability to de novo predict tissue-specific gene expression and confirmed its reliability in 57 of the 79 available human tissues, with an average precision in enhancer recognition ranging from 32 percent to 63 percent, and a sensitivity of 47 percent. We used the sequence signatures identified by this approach to assign tissue-specific predictions to ~;328,000 human-mouse conserved noncoding elements in the human genome. By overlapping these genome-wide predictions with a large in vivo dataset of enhancers validated in transgenic mice, we confirmed our results with a 28 percent sensitivity and 50 percent precision. These results indicate the power of combining complementary genomic datasets as an initial computational foray into the global view of tissue-specific gene regulation in vertebrates.