Deciphering the complex regulatory programs controlling gene expression is key to gaining insight into countless biological processes. However, a comprehensive characterization of the regulatory elements controlling expression across diverse cell types remains elusive. Analysis of DNA sequence provides insights into potential regulatory regions but cannot provide functional evidence of regulation on its own. Biochemical assays like ChIP-seq and ATAC-seq map epigenetic marks and regions of open chromatin associated with regulatory activity in a wide variety of cell and tissue types across the genome, but do not directly measure regulatory activity. Functional characterization assays like massively parallel reporter assays or CRISPR interference screens offer more direct evidence of regulatory activity but may have limited genomic coverage and cell type availability. Computational methods integrating these diverse data types can enable the prediction and interpretation of regulatory elements across the genome.
Here, I present integrative modeling approaches that combine epigenomic, functional, and DNA sequence data for the comprehensive annotation of the human regulatory genome. First, we introduce ChromActivity, a computational method for annotating the regulatory genome across hundreds of cell and tissue types. ChromActivity integrates epigenomic data across over a hundred human cell and tissue types with a diverse set of functional characterization datasets to generate genomewide annotations of regulatory activity. ChromActivity provides annotations featuring discrete states reflecting combinatorial activity patterns and also continuous activity scores reflecting predicted regulatory element activities. Next, we present SHARPR-seq, a computational method for integrating DNA sequence information to extend the Sharpr-MPRA high-resolution regulatory activity mapping framework. SHARPR-seq improves upon the SHARPR method in multiple evaluation metrics, enabling improved functional dissection of regulatory elements controlling gene expression. These integrative modeling approaches demonstrate the utility of combining complementary data types to provide a more comprehensive understanding of the human regulatory landscape.