Skip to main content
eScholarship
Open Access Publications from the University of California

UCSF

UC San Francisco Previously Published Works bannerUCSF

Adding Stochastic Negative Examples into Machine Learning Improves Molecular Bioactivity Prediction

Published Web Location

https://pubs.acs.org/doi/10.1021/acs.jcim.0c00565
No data is associated with this publication.
Creative Commons 'BY' version 4.0 license
Abstract

Multitask deep neural networks learn to predict ligand-target binding by example, yet public pharmacological data sets are sparse, imbalanced, and approximate. We constructed two hold-out benchmarks to approximate temporal and drug-screening test scenarios, whose characteristics differ from a random split of conventional training data sets. We developed a pharmacological data set augmentation procedure, Stochastic Negative Addition (SNA), which randomly assigns untested molecule-target pairs as transient negative examples during training. Under the SNA procedure, drug-screening benchmark performance increases from R2 = 0.1926 ± 0.0186 to 0.4269 ± 0.0272 (122%). This gain was accompanied by a modest decrease in the temporal benchmark (13%). SNA increases in drug-screening performance were consistent for classification and regression tasks and outperformed y-randomized controls. Our results highlight where data and feature uncertainty may be problematic and how leveraging uncertainty into training improves predictions of drug-target relationships.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Item not freely available? Link broken?
Report a problem accessing this item