Exploiting Regulatory Heterogeneity to Systematically Identify Enhancers with High Accuracy
- Author(s): Arbel, Hamutal
- Advisor(s): Bickel, Peter J
- et al.
Enhancer discovery through computational means has long been a goal of the genomics community. The tools developed for this purpose, however, tend to underperform when tested on completely held out test sets. Here we use the pregrastrula patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancer are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, extremely high (>98%) prediction accuracy can be achieved in a balanced, held-out test set. The homogenous set is composed predominantly of enhancers driving multi-stage, large segmentation patterns in the early embryo, and hence we term them segmentation driving enhancers (SDE). Prediction is primarily driven by transcription factors DNA occupancy with almost no power derived from histone modifications, including H3K27ac, casting further doubt on the utility of histone modifications to demarcate enhancer elements. The transcription factors used in the prediction process constitute over half of the transcription factors identified in genetic screens as patterning the early embryo, and hence provide a remarkably expansive view of this process. Applying this method to a genome-wide scan, we predict 1,600 SDEs, 916 of which are novel, covering approximately 1.6% of the euchromatic genome. We verified these predictions by testing 41 novel SDEs using in situ whole embryo imaging of stably integrated reporter constructs. We confirmed 39 of these predictions, a 95% precision on a genome-wide scan with an estimated recall of 98%, indicating that our reported collection of SDEs may be close to comprehensive.