Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Statistical Analysis and Visualization of Single Cell RNA Sequencing Data at Population Scale

Abstract

The advent of Single-cell transcriptome sequencing (scRNA-Seq) has revolutionized our ability to explore the intricate landscape of cellular diversity within complex biological systems. Initially focused on cataloging cell subtypes and discerning gene expression disparities across cell types, scRNA-Seq has evolved to address broader inquiries, particularly in the realm of human health. While past efforts concentrated on analyzing numerous cells from a few samples, there’s now a growing interest in understanding inter-sample heterogeneity and its implications for phenotypic outcomes, notably in cancer and inflammatory diseases. However, existing bioinformatic methodologies inadequately address population-level analyses, with limited consideration for inter-sample variation. The dissertation introduces a novel framework termed GloScope Representation, which is introduced in the first chapter in detail, for representing the entire single-cell profile of a sample. In the second chapter, We applied GloScope across scRNA-Seq datasets spanning diverse study designs, with sample sizes ranging from 12 to over 300. Through illustrative examples, we showcase how GloScope empowers researchers to undertake pivotal bioinformatic tasks at the sample level, with a primary focus on visualization and quality control assessment. In Chapter 3, we demonstrate GloScope’s efficacy in evaluating and quantifying batch effects, as well as comparing various batch correction methods’ performance in the patient level analysis of scRNASeq data. Furthermore, to assess GloScope ’s advantages and effectiveness in detecting different classes of single-cell differences arising from variations in sample phenotypes, we compared GloScope to existing visualization tool and other sample level analysis tool in Chapter 4. We also developed a simulation pipeline for generating single-cell count data. We utilize this simulation framework to conduct quantitative evaluations of GloScope through a series of simulated experiments.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View