Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Enhancing Natural Products Structural Dereplication and Elucidation with Deep Learning Based Nuclear Magnetic Resonance Techniques

Abstract

Nature Products Research (NPR) has a long history of revealing bioactive constituents of natural origin, both as single drug leads within modern western medicine and as mixtures of bioactive constituents enriching traditional medicines. Identifying bioactive constituents in complex mixtures such as those obtained from extracting marine algae has been relying on multidisciplinary techniques, such as bioactivity-guided or spectroscopic-guided fractionation and purification. In this regard, milestones of scientific achievements of NPR have been hailed by applying novel technologies, such as improved separation or purification, spectroscopic hardware with detection limits of natural abundance, software algorithms for accelerating data collecting and processing, and high-throughput screening.

In most NPR, the characterization of novel compounds as well as the dereplication of known compounds entails the collection and analysis of NMR spectra. This involves the running of 1D and 2D NMR spectroscopic experiments for the purpose of partial structure construction, assemblage and relative stereochemistry determination. As exciting advancements in the rapid genetic and proteomic approaches have made their way into NPR, conventional NMR practices have become one of several bottlenecks in the characterization and dereplication of new compounds. In regard to this challenge, we leveraged the advantages of Non Uniform Sampling Nuclear Magnetic Resonance (NUS NMR) and Artificial Intelligence (AI) to create Small Molecule Accurate Recognition Technology (SMART) as a tool to speed up marine natural products discovery. Fast NMR techniques like NUS NMR have the potential to further reduce detection limits while maintaining the same sampling time and quality. Next, we applied over 4000 experimental Heteronuclear Single Quantum Correlation (HSQC) spectra for the AI training. The outcome is that the AI algorithm provided us with structurally insightful AI embedding maps with nodes and clusters representing correlations of related families of natural products. By testing different HSQC spectra using this algorithm, we can greatly accelerate the rate of known compound identification as well as rapidly generating hypotheses about the relationship of new molecules to those used for the training - based entirely on their NMR properties. Specifically, the 2D NMR spectra of a series of unknown compounds isolated from two different marine cyanobacteria were recognized by the SMART belonging to a specific class of marine depsipeptides.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View