Single-cell genomics is a rapidly advancing field that is leading to unprecedented new insights into complex biological systems, ranging from diverse microbial populations to early mammalian embryogenesis. This dissertation has contributed to the field by presenting techniques that can used to study cell differentiation and tissue development: 1. a mathematical model to reconstruct cellular lineage in pre-implantation embryos, 2. an efficient mRNA enrichment method for prokaryotic cells, which can also be used to enrich for mammalian non-coding RNA (ncRNA).
Cellular lineage reconstruction: Lineage reconstruction is central to understanding tissue development and maintenance. While various tools to infer cellular relationships have been established, these methods typically involve genetic modification and have a clonal resolution. This dissertation introduced scPECLR, a probabilistic algorithm to endogenously infer lineages at a single cell-division resolution using the epigenetic mark 5-hydroxymethylcytosine (5hmC) in single cells. When applied to 8-cell mouse embryos, scPECLR predicted the full lineage trees with greater than 95% accuracy. Furthermore, a protocol to detect both 5hmC and genomic DNA from the same single cell was developed. Information from genomic DNA, in combination with scPECLR, could allow us to identify cellular lineages more accurately and expand the reconstruction to even larger trees. The high accuracy of scPECLR, using only endogenous marks, suggests that the method can be directly extended to study human development.
Low-input bacterial mRNA sequencing: RNA sequencing is a powerful approach to quantify the genome-wide distribution of mRNA molecules in a population to gain understanding of cellular functions and phenotypes. However, compared to mammalian cells, mRNA sequencing of bacterial samples is more challenging due to their 100-fold lower RNA quantities and the absence of a poly(A)-tail that typically enables enrichment of mRNA. To overcome these limitations, an effective mRNA enrichment method called EMBR-seq was introduced. The method resulted in greater than 90% of the sequenced E. coli RNA reads deriving from mRNA, which originally contributed to lower than 5% of total RNA in a cell. Moreover, EMBR-seq successfully quantified mRNA from 20 picogram total RNA, a level 500-fold lower than required in existing commercial kits. In addition, EMBR-seq can be combined with an orthogonal rRNA depletion method, RNase H, to improve the efficiency of mRNA enrichment. Due to its simplicity and efficiency, EMBR-seq could potentially be extended to a single-cell resolution to advance developments in bacterial mRNA sequencing and to investigate gene expression patterns in non-model microbial species.
Mammalian non-coding RNA sequencing: Despite being highly investigated, mRNA accounts for less than 5% of the mammalian RNA that are transcribed. There are many other types of non-coding RNAs capable of performing various functions, including transcriptional regulation. Similar to bacterial mRNA, mammalian ncRNA lacks a poly(A)-tail. EMBR-seq was shown to deplete mammalian rRNA and increase the number of unique ncRNA detected.