Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Efficient Algorithms for the Analysis of Hi-C Contact Maps

Abstract

This dissertation deals with the analysis of high-throughput chromatin conformation capture (Hi-C) data. Hi-C experiments provide genome-wide maps of chromatin interactions and has enabled Life Scientists to investigate the role of the three-dimensional structure of genomes in gene regulation and other essential cellular functions. Several studies have confirmed the existence of fundamental 3D structural features of different scales that are stable across cell types and conserved across species, e.g., topological associating domains (TADs) and chromatin loops.

The research presented here is articulated around three main topics on the analysis of contact maps, namely (1) the detection of TADs, (2) how to compare two maps, and (3) how to detect chromatin loops. The detection of TADs has become a critical step in the analysis of Hi-C data, e.g., to identify enhancer-promoter associations. First, we present \textsc{East}, a novel TAD identification algorithm based on fast 2D convolution of Haar-like features, that is as accurate as the state-of-the-art method based on the directionality index, but 75-80$\times$ faster.

Another fundamental problem in the analysis of Hi-C data is to compare two contact maps derived from Hi-C experiments to identify the functional differences. Detecting similarities and differences between contact maps is critical in evaluating the reproducibility of replicate experiments and identifying differential genomic regions with biological significance. Due to the complexity of chromatin conformations and the presence of technology-driven and sequence-specific biases, the comparative analysis of Hi-C data is analytically and computationally challenging. Second, we present a novel approach called Selfish for the comparative analysis of Hi-C data that takes advantage of the structural self-similarity in contact maps. We define a self-similarity measure to design algorithms for (i) measuring reproducibility for Hi-C replicate experiments and (ii) finding differential chromatin interactions between two contact maps. Extensive experimental results on simulated and real data show that Selfish is more accurate and robust than state-of-the-art methods.

Regulatory elements at large genomic distances can engage in gene regulation by making direct physical contacts to their target genes or loci bringing distant loci in close spatial proximity of each other forming chromatin loops. These long-range interactions form complex regulatory networks that need to be carefully studied. Analyzing chromatin interactions between regulatory elements and genes at high resolution using high-throughput chromosome conformation capture method Hi-C, can provide fundamental insights into the spatial organization of chromosomes and its effect on gene regulation. Third, we present a new method Mustache to detect significant chromatin interactions genome-wide. Mustache robustly finds chromatin pairs of loci that interacts significantly compared with the expected interaction. We show that detected interactions are biologically supported by running a wide range of experiments. The experiments indicate that these interactions are associated with contacts between promoters and enhancers, promoters to promoters, mediated by different proteins and are stable between cell types.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View