Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

Deep Representation Learning for Single-cell Sequencing Data Analysis

Creative Commons 'BY' version 4.0 license
Abstract

Single-cell sequencing assays nowadays provide comprehensive genomics readouts at single cell resolution. These measurements provide unprecedented opportunities for researchers to study cell heterogeneity and elucidate transcriptional regulatory mechanisms. However, computational modeling of single-cell sequencing data is challenging due to its high dimension, extreme sparsity, complex dependencies and high sensitivity to noises from various sources.

In this thesis, we present our works of designing representation learning frameworks to deal with various noises and effectively learn meaningful representations of cells and genes from large-scale single-cell sequencing datasets. In the first part, we present our design using deep generative models to learn confounding-free representations of cells through invariant representation learning on scATAC-seq data. By eliminating the variations of confounding factors in the latent space through mutual information minimization, our method produces biologically more meaningful representations of cells, which brings in significant benefits in downstream analyses. As a follow-up work, we present our strategy to extend this framework to a multi-modal setting. Instead of performing hard alignment by projecting both modalities to a shared latent space, our method encourages the local structures of two modalities measured by pairwise similarities to be similar. This strategy is more robust against overfitting of noises, and facilitates various downstream analysis such as clustering, imputation, and marker gene detection. In the second line of work, we present our design of foundation models to learn meaningful semantic representation of genes from broad scRNA-seq datasets. We show that pretraining foundation models on large-scale single cell datasets enable the models to learn meaningful features of genes that are transferable to many other downstream tasks. The pretrained model can also be adapted for imputation tasks with great performance.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View