Computational Approaches to Expand the Applications of Chromatin State Annotations
Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Computational Approaches to Expand the Applications of Chromatin State Annotations

Abstract

Genome-wide mappings of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the non-coding genome. Computational approaches such as ChromHMM have been applied to discover the combinatorial and spatial patterns of chromatin marks in a biosample, characterize them as chromatin states, and subsequently annotate the biosample’s epigenome into chromatin states. As more biosamples’ chromatin marks data are generated, it becomes more challenging to manually study biological similarities and differences in the chromatin state maps across many biosamples. We therefore have developed methods to derive epigenome annotations that incorporate data from multiple biosamples and highlight notable epigenetic properties. First, we introduced a large-scale application of ChromHMM that generates a universal chromatin state map for the human genome that can be shared across cell types. In particular, we trained ChromHMM with input data from >1,000 experiments in >100 human biosamples from Roadmap Epigenomics and ENCODE projects. We denoted the resulting chromatin state map the ‘full-stack’ annotation. We conducted comprehensive analyses to characterize the full-stack states’ biological interpretations, and uncovered patterns of cell-type-specific and constitutive regulatory activities in each state. The full-stack annotation, along with detailed state characterizations, are useful for researchers in understanding the epigenetic contexts of genomic loci of interests. Building on this work, we developed and analyzed an equivalent universal chromatin state annotation for the mouse genome. We trained such an annotation using input data from >900 ChI-seq/ATAC-seq or DNase-seq experiments from the ENCODE and Mouse ENCODE projects, and characterized the resulting states and related them with those from the human full-stack model. Given the wide applications of mice as a model organism to study human disease mechanisms, the mouse full-stack annotation is expected to be highly useful for researchers to investigate the mouse epigenetic landscapes. Lastly, we developed a method named CSREP to derive a genome-wide probabilistic summary chromatin state map given data from a group of biosamples with common biological properties. We validated CSREP’s output summary chromatin state maps for groups of samples with shared tissue types from the Roadmap Epigenomics and EpiMap projects, and showed that CSREP can better predict genomic locations of individual chromatin states in held-out biosamples. We further showed an extension of CSREP where the summary chromatin state maps for two groups of samples are used to prioritize differential chromatin state changes between the two groups. Overall, our work aims to derive genome-wide chromatin state annotations that can aggregate and derive the patterns of epigenetic assays within and across different cell identities. All methods we present can be widely applicable to newer and larger datasets that will be made available in the future, while the data of chromatin state annotations we provide can be useful to the larger community in understanding the regulatory patterns across the genome of human and mouse organisms.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View