Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Deep Learning Methods for High-Resolution Functional Annotation and Discovery of Novel Connections Between Gene Sets

Creative Commons 'BY-NC' version 4.0 license
Abstract

Proteins are essential to life. Precise understanding of protein functions is critical in addressing many biomedical questions. Different protein isoforms can be produced from a single gene through alternative splicing, which greatly expands the diversity of proteins and the complexity of cellular functions. However, precise annotations that differentiate functions of isoforms are few. On the other hand, effective modeling of functional knowledge can empower computational methods in many biological applications. A fundamental step in such applications is the discovery of gene sets. Methods that can accurately map genotypes to phenotypes are needed for detecting novel connections between gene sets derived from different experiments, which could enable new biological discoveries.

Along with the accumulation of large-scale biological data, deep learning applications to biological data analysis are flourishing. In this dissertation, we propose three deep learning methods for the two related problems in functional genomics, i.e., producing high-resolution functional annotation at the isoform level, and discovering connections between experimentally derived gene sets via functional knowledge. First, we design DIFFUSE, which for the first time integrates isoform sequences and expression profiles to systematically predict isoform functions, by combining the power of deep learning and probabilistic graphical models. Second, to enhance the prediction of isoform functions, we propose FINER, which jointly predicts isoform functions and isoform-isoform interactions through the introduction of a unified learning objective, enabling the two tasks to benefit from each other. Finally, we develop FEGS, a representation learning approach based on hypergraph embedding, which embeds gene sets as compact features encoding functional information of gene members and facilitates gene set comparison by more sensitive detection of common phenotypes. FINER and DIFFUSE significantly outperform the existing isoform function prediction methods, and their predictions are validated by independent biological data. FEGS has been successfully applied to drug discovery and cell type identification.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View