Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Statistical Methods for Facilitating Single-Cell 3D Genome Analysis

No data is associated with this publication.
Creative Commons 'BY-NC' version 4.0 license
Abstract

The rapid increase in single-cell Hi-C technologies have enabled researchers to explore the 3D genome organization and reveal the heterogeneity in single cells. As the comprehensive analyses are raised in genome studies, computational methods are also in the fast-growing era, to facilitate the statistical analysis and unveil the genome features. However, the increasing in computational approaches and experiments requires the development of suitable simulators for scHi-C data due to the high cost of these sequencing techniques, that can help the benchmarking of methods and efficient experimental designs. Besides the demand in simulation, the integration methods are still at the early stage as the scHi-C has recently expanded in scale. This expansion is achieved by multiple experiments at different time from various laboratories, which leads to the existence of batch effect that complicates subsequent data analyses. In this dissertation, we proposed two statistical methods: scHiCPRSiM and SHIM, to resolve current limitations.

scHiCPRSiM is a versatile and robust statistical simulator of scHi-C data. Our approach aims to generate scHi-C contact maps that enables researchers to quantitatively assess scHi-C experimental design and benchmark existing analytical approaches. Notably, scHiCPRSiM excels in generating realistic scHi-C datasets that closely resemble real data, capturing vital chromatin structure features, providing valuable guidance for optimizing experimental design by striking a balance between cell clustering accuracy and budget constraints, and facilitating the performance evaluation and comparison of scHi-C analytical methods.

SHIM is presented as a novel statistical integration model for multiple scHi-C datasets from different laboratories and experiments. Through its application to real scHi-C data sets with batch effects, we demonstrate that SHIM can effectively remove batch effects within data from the same laboratory and accurately merge cells with the same cell type. Furthermore, we show that SHIM can integrate scHi-C data from different laboratories and regions, leading to improved performance in clustering analyses. Our approach offers a robust and statistically elegant solution for integrating multiple scHi-C datasets, facilitating accurate downstream analysis in revealing genome chromatin structure and application in cell functionality projection.

Main Content

This item is under embargo until July 19, 2025.