Deep Learning Algorithms for Predicting Histone Post-Translational Modificationsand Single Guide RNA CRISPR Efficiency
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Deep Learning Algorithms for Predicting Histone Post-Translational Modificationsand Single Guide RNA CRISPR Efficiency

Abstract

In this dissertation, we investigate two problems in computational biologythat can be solved using machine learning methods, specifically using deep learning architectures.

In this first, we study the problem of predicting histonepost-translational modifications (PTMs) from transcription factor binding data and the primary DNA sequence. Histone PTMs are involved in a variety of essential regulatory processes in the cell, including transcription control. Here we introduce a deep learning architecture called DeepPTM for predicting histone PTMs. Extensive experimental results show that DeepPTM outperforms the prediction accuracy of the model proposed in Benveniste et al. (PNAS, 2014) and DeepHistone (BMC Genomics, 2019). The competitive advantage of our framework lies in the synergistic use of deep learning combined with an effective pre-processing step. Our classification framework has also enabled the discovery that the knowledge of a small subset of transcription factors (which are histone-PTM and cell-type-specific) can provide almost the same prediction accuracy that can be obtained using all the transcription factors data.

In the second, we investigate the problem of predicting single guide RNA(sgRNA) CRISPR-Cas9 and CRISPR-Cas12a activity from the primary sequence of the sgRNA. A negative selection screen in the absence of non-homologous end-joining (the dominant DNA repair mechanism) is used to generate single guide RNA (sgRNA) activity profiles for both SpCas9 and LbCas12a for the non-conventional yeasts \emph{Yarrowia lipolytica} and \emph{Kluyveromyces marxianus}. This genome-wide data serves as input to a deep learning algorithm, DeepGuide, that is able to accurately predict guide activity. DeepGuide uses unsupervised learning to obtain a compressed representation of the genome, followed by supervised learning to map sgRNA sequence, genomic context, and epigenetic features with guide activity. Experimental validation, both genome-wide and with a subset of selected genes, confirms DeepGuide’s ability to accurately predict high activity sgRNAs. We also show that the prediction accuracy of DeepGuide can be further improved by incorporating sgRNA samples from different screening conditions of the genome-wide library based on carbon source (glucose, xylose, and lactose) and the temperature at which the non-conventional yeast is grown. To the best of our knowledge, our method is the first sgRNA predictive tool that employ guides from different screening conditions to improve the prediction performance.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View