Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Computational methods for deducing biological processes involved in wound healing based on gene analysis

Abstract

The Gene Ontology (GO) is a set of uniquely identified biological processes defined by a set of genes and organized hierarchically. Overrepresentation analysis is commonly used to determine the statistically significant GO terms assigned to a list of genes. However, this method has some drawbacks to identifying the most significant biological processes from a list of differentially expressed genes from microarray data. Namely, many GO terms are highly overlapping, and many GO terms are too vague or too specific to provide meaningful interpretation. In this work, I develop a pipeline to derive a shortlist of GO terms obtained from overrepresentation analysis. I do this in two steps, ``representation filtering" and ``similarity filtering." First, I use information theory to quantify specificity of GO terms, and define metrics to quantify representation of GO terms in the overall dataset. These metrics are used to reduce the list in the representation filtering step. Second, I obtain pairwise similarity scores of GO terms from the NaviGO, and use these scores to perform the similarity filtering step, which eliminates redundancy in the list. This pipeline is applied to overrepresentation analysis of time-series transcriptomic data in wound healing in mice and humans. By analyzing the resulting lists of GO terms at each time point measured, I show that the shortlists significantly reduce GO list size, yet provide concise descriptions of expected wound healing stages. The main takeaways and conclusions from this study include: significant overlap between inflammation and proliferation is evident; proliferation related processes are more pronounced and varied in humans than in mice; and that inflammation is relatively consistent across datasets, but may appear to be prolonged depending on the thresholds set for differential expression of genes. This method provides a tool that allows data from transcriptomic studies to be used in translational research. In future work, the tool may used for other experiments involving time-series transcriptomic data.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View