Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Integrating Prior Biological Knowledge and Machine Learning for Single-Cell Transcriptomics Analysis

Creative Commons 'BY-SA' version 4.0 license
Abstract

Single-cell RNA sequencing (scRNA-Seq) has offered a unique window into studying cellular identity at unprecedented scale and resolution. However, the process of revealing this cellular identity remains challenging. For example, the annotation of each assayed cell with a cell type label indicating its functional identity still relies on manual examination, which is rate-limiting and poses reproducibility issues. Similarly, inferring the activity of gene regulatory pathways specifying cell state relies on methods designed for bulk RNA sequencing data and do not make use of the important amount of data generated by single-cell experiments. Here, I describe my work to combine prior biological knowledge about cellular entities contained in curated databases and machine learning to shed light on the cellular identity of single cells. Specifically, I developed statistical frameworks for the automated annotation of single-cell transcriptomes with cell type labels by integrating prior cell ontology information and cell type-specific marker gene sets. Then, I developed a method to infer pathway activity in single cells by using recent progress in the field of deep generative modeling as well as prior knowledge from gene annotation databases. I discuss potential future direction to design generative model architectures to approach the more ambitious task of modeling targeted perturbation of pathways or transcription factors to perform in-silico experiments and alter cellular state at the single-cell level. Finally, I present collaborative work, notably on generalizing drug response prediction from bulk transcriptomic profiles of cell lines to cancer patients, integrating information about chemical structure in the predictive model. This body of work contributes to the growing literature of methods incorporating prior knowledge about biological systems into complex machine learning frameworks, as well as highlights the challenges met in such integration.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View