Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

Single Cell Multi-modal Analysis Using scDMVAE with an Emphasis on SCoPE2 Technology

Abstract

Effective multi-modal integration of single cell datasets is critical for uncovering the biological properties of cells from different molecular perspectives. However, this poses significant challenges, including how to preserve shared information and account for differences between differently distributed datasets, how to integrate datasets linked by different anchors (cells or features) and how to improve the quality of datasets for integration. In this dissertation, we introduce two novel models that address these challenges. First, we present scDMVAE, a neural network model that can capture both shared and data-specific aspects of datasets in a latent space. scDMVAE can handle both cell-linked and feature-linked datasets through its embedding learning and attention-based matching components, respectively. We demonstrate the effectiveness of scDMVAE on a cell-linked CITE-seq dataset to reveal different cell type relations between mRNA and protein, and on feature-linked SCoPE2 proteomics and scRNA-Seq mRNA human testis datasets to transfer labels from mRNA to protein. Additionally, we present PCRID, a principal curve based model that aligns the retention time of peptides to improve confidence estimates of peptide-spectrum-matches (PSMs) in SCoPE2 technology. PCRID outperforms existing models like DART-ID by handling non-linearities in retention time more effectively, increasing the identification rate of peptides by 154.53 % at a PEP threshold of 0.01 while controlling false discoveries. Together, these models represent significant advances in single cell data analysis and have broad applications across related fields.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View