Single-cell multi-omic analysis of immune cell development
- Author(s): Steier, Zoe R
- Advisor(s): Streets, Aaron;
- Yosef, Nir
- et al.
The continuous differentiation and selection of T cells within the thymus is critical for the maintenance of mammalian adaptive immunity. Yet it is unclear precisely how thymocyte development and fate determination occur to produce T cells with different specified effector functions. Recent technological innovations in microfluidics and genomic sequencing have enabled high-throughput approaches for probing cell identities and development by measuring multiple molecular features in thousands of single cells. However, there has been a lack of computational methods capable of synthesizing this data to form a coherent view of cell identity. Here, I present a new method to analyze multi-omics data, describe how experimental and computational multi-omics analysis can be performed in practice, and apply these approaches to investigate T cell development.
First, I address the task of multi-omics data analysis. The paired measurement of RNA and surface proteins in single cells with CITE-seq is a promising approach to connect transcriptional variation with cell phenotypes and functions. However, combining these paired views into a unified representation of cell state is made challenging by the unique technical characteristics of each measurement. Here I present Total Variational Inference (totalVI), a deep generative model for end-to-end joint analysis of CITE-seq data that probabilistically represents the data as a composite of biological and technical factors including protein background and batch effects. To evaluate totalVI’s performance, I profile immune cells from murine spleen and lymph nodes with CITE-seq, measuring over 100 surface proteins. I demonstrate that totalVI provides a cohesive solution for common analysis tasks like dimensionality reduction, the integration of datasets with different measured proteins, estimation of correlations between molecules, and differential expression testing.
Next, I present a guide for fellow researchers on how single-cell multi-omics analysis of RNA and proteins can be performed in practice. Despite the increasing availability of commercial experimental products and open-source software packages, there are many details and practical challenges that scientists must overcome in order to implement published methods in real-world settings across different biological contexts and experimental designs. Here I provide an overview of the experimental and computational pipelines for single-cell analysis of RNA and proteins. I then describe the practical steps necessary to complete these pipelines from collecting paired RNA and protein data from single cells to preprocessing and filtering the sequencing data, running the totalVI model, and conducting downstream analysis. I also provide notes on common pitfalls and offer recommendations so that joint analysis of RNA and proteins can be applied widely to other biological systems.
Finally, I apply these methods for single-cell multi-omics analysis to investigate T cell development in the thymus. CD4 and CD8 T cells play a critical role in the mammalian immune system and understanding their fate decisions during development has broad clinical implications relevant to autoimmune diseases such as type 1 diabetes and to the production of cancer immunotherapies. While the development of CD4 and CD8 T cells within the thymus from the CD4+CD8+ stage has been widely studied as a classic model of a lineage determination, the developmental trajectory from immature thymocytes to mature T cells and the mechanism of lineage commitment remain unclear. To deconstruct this developmental process, I apply CITE-seq to simultaneously measure the transcriptome and over 100 surface proteins in thymocytes from wild-type and lineage-restricted mice. Using totalVI, I jointly analyze the paired measurements to build a comprehensive timeline of RNA and protein expression in the CD4 and CD8 lineages. Using lineage-restricted samples, I identify early differences that implicate the calcineurin-NFAT branch of the T cell receptor signaling pathway as a putative driver of lineage commitment. Employing drug perturbations in a neonatal thymic slice system, I validate the requirement of calcium signaling through NFAT for CD4, but not CD8, lineage commitment and shed light on the CD4/CD8 lineage commitment mechanism.