Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Expanding the applicability of gene set enrichment analysis with data-driven gene set refinement and adaptation for sparse data

No data is associated with this publication.
Abstract

The study of gene expression, the abundance of RNA transcripts in cells, has played a critical role in expanding knowledge of the molecular underpinnings of cancer and the ways in which treatments affect it. Because genes operate in interlinking pathways with nuanced and context-dependent behavior, deriving the phenotypic implication of changes in abundances of individual genes can be challenging. For this reason, Gene Set Enrichment Analysis (GSEA) was developed. GSEA is a statistical method that quantifies the activation of pathways or processes as represented by a priori annotated gene sets and plays a critical role in constructing pathway-level understandings of cell states in diseases such as cancer. In the use of GSEA and the expansion of its companion database of gene sets, the Molecular Signatures Database (MSigDB), two challenges have emerged. First, some gene sets lack the properties of context-specificity and coordinate regulation, that is, not all their members be collectively more expressed in samples that have a specific phenotype to which the gene set should correspond. Second, the new technologies for measuring gene expression at single cell resolution yield data with different properties than the “bulk” gene expression data for which GSEA was initially developed. I will address the first challenge in Chapter 1, in which I propose a data-driven method for gene set refinement and show that this method yields new gene sets that are both more coordinately regulated and context-specific. I address the second challenge in Chapter 2, in which I present a characterization of the performance of the single sample version of GSEA (ssGSEA) in multiple datasets and propose adaptations which I will show produce enrichment scores that are more stable and certain in the single cell context. Finally, Chapter 3 is an application of gene set based pathway characterization to the problem of understanding the development of chemoresistance in ovarian cancer. Using a cell cycle-aware approach incorporating single sample GSEA, we propose a new framework for modeling the development of resistance to carboplatin. These analyses together lay the groundwork for improving the robustness of GSEA results, adapting it to emerging technologies for gene expression measurement, and applying it to address key challenges in the treatment of cancer.

Main Content

This item is under embargo until April 12, 2025.