Leveraging Chemical Structures and Molecular Information with Interpretable Deep Learning
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Leveraging Chemical Structures and Molecular Information with Interpretable Deep Learning

Abstract

As biological data become more readily available and convoluted, equally involved methodsare needed to predict and understand outcomes in biological systems. Classical machine learning methods are not well suited for prediction tasks that need to integrate heterogeneous sources of information to predict the target variable. Deep learning is capable of integrating such disparate inputs with impressive results. Here, I present my work in integrating cancer cell line transcriptomic information with the chemical structure information of the perturbagen they were treated with (Chapter 2). This work leverages recent developments in deep learning for aligning domains (here cell lines and patients) in a data-driven way and advanced featurization of molecules. I show that by integration of these methods predicting drug response in patients is improved compared to more conventional methods. This model can be used to identify therapies for a patient by using only transcriptomic and chemical information. I further present my work on applying chemical featurization on nanopore sequences to de novo model nucleotide modifications (Chapter 3). Given the polynomial nature of possible modifications, producing gold standard data to identify such events is a daunting task. I show that knowledge learned on chemical features in the canonical (un-modified) context can be transferred to identify nucleotide modifications with a high degree of accuracy. Finally, in Chapter 4 I present a collaborative work on developing an interpretable deep learning model for identifying the activation of biological pathways following the application of a perturbagen. We show that by guiding the ow of information through the neural network we can extract more biologically meaningful information following perturbation compared to more classical methods such as Gene Set Enrichment Analysis (GSEA). This model can be used to circumvent costly and time-consuming experiments to inform of the pathways being altered during the application of a pertubagent is any biological system.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View