Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Studying the Spatial Organizations of Chromosomes With Machine Learning

Creative Commons 'BY' version 4.0 license
Abstract

The Hi-C technique has enabled genome-wide mapping of chromatin interactions and investigated the organizational principles of the spatial structure of the genome. However, computational methods for studying Hi-C are still in the early stage.Pioneering models have been developed to reconstruct the 3D genome structures. But there is no consistent measuring result between the modeling methods, thereby imposing challenges for the users to interpret the organization of the 3D structures and to understand the genome functions. Moreover, 3D genome modeling becomes more complicated at higher resolution because of the sparsity and diversity in bulk Hi-C and the restrictions of computational resources. Furthermore, high-resolution Hi-C requires costly, deep sequencing; therefore, it has only been achieved for a limited number of cell types. Neural networks have been developed as a remedy to these problems at high resolution.

In this work, we first reviewed the 3D structure modeling methods comprehensively. We developed a simulation method based on the 3D conformations from single-cell modeling and several evaluation metrics for measuring the similarity between structures. We profiled the performance of existing bulk Hi-C based 3D genome structure modeling methods using both simulated and real bulk Hi-C.

Next, we proposed a novel method, GIST, for predicting 3D structures at a high resolution based on Auto-encoder with GAT. We convert the Hi-C into a heterogeneous graph, and GIST encodes the graph as a population of 3D conformations optimized by edge classification. We demonstrated that GIST produced chromosome structures consistent with FISH and outperformed existing 3D modeling methods. We illustrated the diversity of 3D structure predictions by evaluating the active and inactive X chromosome structures.

Lastly, we proposed a novel method, EnHiC, for predicting high-resolution Hi-C from low-resolution input based on a generative adversarial network. Inspired by non-negative matrix factorization, EnHiC fully exploits the unique properties of Hi-C and extracts rank-1 features from multi-scale low-resolution matrices to enhance the resolution. We demonstrated that EnHiC accurately and reliably enhanced the resolution of Hi-C and outperformed other GAN-based models. EnHiC-predicted high-resolution matrices facilitated the accurate detection of topologically associated domains and fine-scale chromatin interactions.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View