Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Methods and applications of integrating single nucleus and bulk tissue RNA sequencing

Abstract

Obesity typically precedes and accompanies the development of cardiometabolic diseases (CMD) that lead to increased morbidity and mortality. One of these disorders is non-alcoholic fatty liver disease (NAFLD), which encompasses a spectrum of varying degrees of fat accumulation and inflammation in the liver. More severe forms of NAFLD, such as non-alcoholic steatohepatitis (NASH), lead to a higher risk of developing hepatocellular carcinoma (HCC), the most prevalent form of liver cancer. Adipose tissue dysfunction in obesity can lead to increased circulating free fatty acids, and thus to ectopic lipid deposition in the liver. Left unchecked, lipotoxicity in the liver can result in inflammation, cell death, fibrosis, and ultimately the development of HCC. In both adipose and liver tissues, non-parenchymal cells, such as vascular and immune cell-types, play important roles in the normal function of these tissues and the pathophysiology of obesity, NAFLD, and HCC. A holistic approach to studying cell-types in a global manner would therefore greatly enhance our understanding of these common obesity-related diseases.

Single-cell technologies, such as single-cell RNA-sequencing (scRNA-seq), assay individual cells and provide an excellent tool to study cell-type changes. While these approaches provide high resolution, they are currently costly and low-throughput. Traditional methods that measure molecular phenotypes at the tissue level are therefore still more practical. These assess a composite sum of cells present in the sample or biopsy, leading to inherent uncertainty in whether observed results are due to changes at the compositional level, cellular level, or both. Given these limitations, I aimed to integrate bulk-tissue RNA-sequencing (RNA-seq) and scRNA-seq data to leverage larger sample sizes in bulk RNA-seq and higher resolution in scRNA-seq.

The application of single-cell technologies is especially promising for biobanks, as they can contain multiple levels of data on participants to uncover novel associations. Tissues are typically stored frozen, however, and this usually requires nuclei suspensions for single-nucleus RNA-seq (snRNA-seq), whereas whole cells would typically be used for scRNA-seq. This presents challenges for current droplet-based technologies. RNA from the ambient pool of lysed cells and nuclei can encapsulate into droplets, confounding results. In Chapter 2, I present a computational method to remove empty droplets from gene expression data (Alvarez et al. 2020). This allows for cleaner downstream data analysis by ensuring that only droplets with nuclei or cells are used.

As current scRNA-seq technologies are low-throughput, their application to population-based studies and cohorts are limited. Present scRNA-seq technologies have lower throughput compared to bulk-tissue RNA-seq, which are typically available in higher sample sizes. In Chapter 3, I developed a method to help address this methodological gap. This approach, called Bisque (Jew et al. 2020), estimates cell-type composition in bulk RNA-seq data sets using single cell level reference data from the same tissue. The estimated cell-type proportions can be associated with sample-level data to uncover relevant cell-types, or they can be included as covariates in a model to reduce confounding caused by cell-type heterogeneity. One advantage of our method is that it requires only a minimum amount of information in the form of cell-type markers. This makes it attractive for existing data sets, which may not have accompanying single-cell level RNA-seq data.

In the fourth chapter of this dissertation, I present our application of snRNA-seq to HCC. Carcinomas, such as HCC, are typically characterized by high amounts of tissue heterogeneity. Larger scale cancer cohorts usually lack single-cell level data, making interpretation of bulk-tissue results challenging. Here, I integrated HCC single-cell level experiments with relatively large HCC case-control bulk RNA-seq cohorts. The results from these analyses highlighted the role that proliferating cells play in HCC (Alvarez et al. 2022). These cycling cells were highly enriched in cancer tissue, as expected, and were prognostic of poor survival outcomes consistently in two independent cohorts. Furthermore, we observed that individuals with TP53 mutations have higher levels of these proliferating cells. Thus, our integration helped to interpret tumor gene expression changes as cell-type composition changes.

In the fifth chapter, I present our human adipose tissue snRNA-seq results, showing changes in obesity and insulin resistance (Alvarez et al. manuscript in preparation). We applied multiplexing to increase our snRNA-seq sample size to roughly 100 subcutaneous adipose samples and over 100,000 nuclei, providing unprecedented resolution of human adipose tissue. This allowed us to identify finer resolution subcell-types, or cell states, which are more challenging to study as they are lower in frequency and exhibit more subtle differences. In addition to substantiating previous findings, we identified subcell-types associated with CMD. Then, we apply integrative approaches to corroborate these cell state changes in adipose bulk RNA-seq. Overall, our results show that both main cell-type and subcell-type variations are associated with metabolic traits.

In summary, this dissertation presents my work on the integration of snRNA-seq and bulk- tissue RNA-seq to leverage distinct advantages provided by each. This has allowed us to gain a better understanding of the origin of gene expression changes in CMD.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View