Integrated computational analysis of brain cell transcriptomes and epigenomes
Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Integrated computational analysis of brain cell transcriptomes and epigenomes

Abstract

The mammalian brain consists of a vast network of neurons and non-neuronal cells with diverse morphology, anatomy, physiology and behavioral roles. These cellular phenotypes are enacted and maintained by a complex molecular program, including the abundance of gene transcripts, i.e. the transcriptome, and epigenetic modifications of DNA, i.e., the epigenome. Single cell sequencing assays, capable of measuring the entire transcriptome or epigenome for hundreds of thousands of single cells, have enabled the systematic characterization of brain cell types at unprecedented scale and with fine granularity. However, it is challenging to integrate diverse datasets, which differ in sample and library preparation, sequencing platforms, and assay modalities, for a consistent biological understanding of cell type organization. This thesis presents novel computational algorithms to integrate brain cell transcriptomes and epigenomes. We developed SingleCellFusion, which integrates disparate datasets into a common feature space based on a constrained k-nearest-neighbor graph algorithm. Using SingleCellFusion, we integrated 8 datasets with >400,000 cells from the mouse primary motor cortex (MOp). This analysis identified 56 neuronal cell types with consistent cell type specific patterns of gene expression, chromatin accessibility and DNA methylation. To validate the accuracy of SingleCellFusion, we helped to develop a novel multimodal sequencing assay, snmCAT-seq, that simultaneously measures methylCytosine (mC), chromatin Accessibility (A), and Transcriptome (T) from the same cells. Applying snmCAT-seq to 3,898 human frontal cortex cells, we identified fine grained neuronal cell types. SingleCellFusion integrated single-cell transcriptomes and DNA methylomes from the same cell types with 62.6~87.3% accuracy, recapitulating snmCAT-seq results at the cell type level. Cell type specific gene expression is in part regulated by epigenetic modifications of DNA at cis-regulatory elements (CREs), which are typically located thousands of base pairs away from the gene they regulate. We took advantage of co-variations in gene expression and epigenetic activity at candidate CREs across cell types to identify brain cell-type-specific gene-CRE associations. We developed a method that identified more than 10,000 robust gene-CRE associations from mouse MOp, using an empirical data shuffling procedure to control for false positives due to gene co-expression. Our results highlight the power of integrating transcriptomes and epigenomes to uncover the complex molecular regulation of brain cell types, and will directly enable design of reagents to target specific cell types for functional analysis. It also demonstrates that robust and efficient computational analysis methods are imperative to distill biological understandings from disparate large-scale single cell sequencing data.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View