Data-Augmentation for Single-Cell Transcriptomics and Multi-Omics
- Karikomi, Matthew K
- Advisor(s): Nie, Qing
Abstract
With the advent of single-cell transcriptomics and epigenomics, the goals of experimental biologyhave shifted toward interpretation of the global response to well-defined experimental perturbations. These investigations have two descriptive components: 1) the identification of cells and 2) how the interactions between these units (both individually and collectively) shape this identity. In many cases, these problems involve the identification of unknown biological patterns at multiple scales. This hierarchy of unknowns results in a chicken-or-the-egg problem that I will address through techniques that operate directly on well-defined transformations of raw data, an approach commonly referred to as “data-augmentation”. The relationship between these contributions closely follows the above biological goals: By necessity, the fulfilment of (1) involves measurements are far noisier than traditional techniques which are both more accurate and more limited in the scope of variation they reveal. In this context, (2) introduces extra challenges: it requires accurate, low- noise input, but also benefits from prior knowledge in order to improve its prediction of physically- constrained interactions. I address the problem (1) of denoising in scRNA-seq by developing an iterative imputation algorithm and demonstrate its facilitation of the cell-cell communication (CCC) inference problem posed by (2). I also introduce a data-mining tool whose ability to mine and annotate machine-readable knowledge graphs greatly facilitates the parameterization of existing CCC inference pipelines. Finally, I explore strategies for the prediction of adaptive inflammatory response through global alterations in the coupling of transcription and accessibility