Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Navigating the Human Epigenome through Random Forests

Abstract

With the recent identification of over 100 histone modifications in mammalian cell-types, there is an urgent need to discover the minimal set of modifications that can completely characterize a genomic element. Of particular interest are transcriptional enhancers that play critical roles in cell-type specific gene expression but are difficult to characterize because they often act in a distal manner to the gene they regulate. We developed a Random-Forest based algorithm, RFECS (Random Forest based Enhancer identification from Chromatin State) for genome- wide prediction of enhancers which allowed us to identify the most informative and robust set of three chromatin marks for enhancer prediction. In addition, RFECS was seen to have improved accuracy of prediction over previous methods. Applying this method to other genomic elements, we identified the minimal set of histone modifications required for prediction of promoters and gene bodies. Further, we elucidated the distinctive localization of histone lysine acetylations at enhancers, promoters and gene bodies, and obtained novel insights into the association of chromatin modification patterns with splicing. Using our algorithm, we predicted enhancers and promoters in 26 human primary tissues and 6 cell-lines, including 5 early developmental lineages. This lead us to the discovery of a novel class of cis-regulatory elements that can behave as enhancers in one cell-type and promoters in another. Further, we were able to associate the evolutionary conservation of regulatory sequences with properties such as tissue-specificity. RFECS is a powerful algorithm with two-fold advantage. First, we can identify the most informative set of modifications characterizing or distinguishing particular genomic elements, thus enabling an insight into the biological mechanism of function at these regions. Second, we can make accurate predictions of enhancers and promoters in a genome-wide fashion, enabling the comparison of regulatory mechanisms across various human tissues or cellular conditions. Variations of histone modification patterns at the predicted tissue-specific cis-regulatory elements may substantially influence gene expression, which could potentially explain the distinct phenotypes of genotypically identical tissues

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View