Search

Scholarly Works (12 results)

Sort By:

Show:

Thesis
Peer Reviewed

Latent Space Energy-Based Model

Pang, Bo
Advisor(s): Wu, Yingnian

UCLA Electronic Theses and Dissertations (2021)

In this dissertation, we seek a simple and unified probabilistic model, with power endowed with modern neural networks and computing hardware, that is versatile to model patterns of high dimensionality and complexity in various domains such natural images and natural language. We achieve the goal by studying three families of probabilistic models and proposing a unification of them, which leads to a simple but rather versatile model with rich applications in various domains.

In the modern deep learning era, three families of probabilistic models are widely used to model complex patterns. One family is generator model, which assumes that the observed example is generated by a low-dimensional latent vector via a top-down network and the latent vector follows a non-informative prior distribution. The second family is energy-based model (EBM), which specifies a probability distribution of the observed example, based on an energy function defined on the observed example and parameterized by a bottom-up deep network. The third family is discriminative model which is in the form of classifiers and specifies the conditional probability of the output class label given an input signal.

EBM is expressive but poses challenges in sampling since the energy function defined in the data space has to be highly multi-modal in order to fit the usually multi-modal data distribution, while generator model is relatively less expressive but convenient and efficient in terms of sampling owing to its simple factorized form. We first integrate these two models. In particular, we propose to learn an EBM in the latent space as the prior distribution of the generator model, following the philosophy of empirical Bayes. We call the proposed model as latent space energy-based model, consisting of the energy-based prior model and the top-down generation model. Due to the low dimensionality of the latent space, a simple energy function in latent space can capture regularities in the data effectively. Thus, the resulting model is much more expressive than the original generator model with little cost in terms of model complexity and computational complexity. Also, MCMC sampling in the latent space is much more efficient and mixes better than that in the observed data space. Furthermore, we introduce a principled learning algorithm which is formulated as a perturbation of maximum likelihood learning in terms of both objective function and estimating equation, so that the learning algorithm has a solid theoretical foundation.

We verify the proposed model and learning algorithm on a variety of image and text datasets such as human faces, financial news. The model is able to effectively learn from these high-dimensional and complex datasets. As a result, we can sample faithful and diverse samples from the learned models. We also find that since the model is well-learned, it leads to a discriminative latent space that separates probability densities for normal and anomalous data, naturally making this model a tool for anomaly detection.

Having established the effectiveness of the proposed latent space EBM and learning algorithm, we explore two applications which leverage two respective aspects of latent space EBM. In one application, we exploit the expressiveness of latent space EBM and use it to model molecules which are encoded in a simple format of linear strings. Despite its convenience, models relying on this simple representation tend to generate invalid samples and duplicates. Due to its expressiveness, learned latent space EBM on molecules in this simple and convenient representation is able to generate molecules with validity, diversity and uniqueness competitive with state-of-the-art models, and generated molecules have structural and chemical features whose distributions almost perfectly match those of the real molecules. In another application, we explore the aspect of EBM as a cost function and make a connection with inverse reinforcement learning for diverse human trajectory forecasting. The cost function is learned from expert demonstrations projected into the latent space. To make a forecast, optimizing the cost function leads to a belief vector, which is then projected to the trajectory space by a policy network. The proposed model can make accurate, multi-modal, and social compliant trajectory predictions.

Building on top of the unification of generator model and EBM, we further integrates discriminative model into latent space EBM via an energy term that couples a continuous latent vector and a symbolic one-hot vector. With such a coupling formulation, discrete category can be inferred from the observed example based on the continuous latent vector. Also, the latent space coupling naturally enables incorporation of information bottleneck regularization to encourage the continuous latent vector to extract information from the observed example that is informative of the underlying category. In our learning method, the symbol-vector coupling, the generator network and the inference network are learned jointly. Our model can be learned in either an unsupervised setting or a semi-supervised setting where category labels are provided for a subset of training examples. With the symbol-vector coupling, the learned latent space is well-structured such that the generator generates text with high-quality and interpretability and it performs well on classification tasks with a limited amount of labeled data.

Cover page: Latent Space Energy-Based Model

Article
Peer Reviewed

Working Memory Affects Attention to Loss Value and Loss Frequency in Decision-Making under Uncertainty

Proceedings of the Annual Meeting of the Cognitive Science Society, Volume 38 (2016)

Decision-making under uncertainty is pervasive. This worksought to understand the role of working memory (WM) in losssensitivity by utilizing two widely used tasks, the IowaGambling Task (IGT) and the Soochow Gambling Task (SGT),and manipulating WM with a dual-task paradigm. Wehypothesized that WM load would reduce attention to both lossvalue and frequency in the decision-making tasks. To betterdelineate the psychological processes underpinning choicebehavior, we developed an Expectancy-Frequency-Perseveration (EFP) model which parsimoniously capturesthree critical factors driving choices: expected value,frequency of gains and losses, and perseveration. Behavioraland computational modeling results indicate that WM loadcompromised performance in the IGT due to reduced attentionto loss value but enhanced performance in the SGT because ofdiminished attention to loss frequency. Our findings suggestthat WM heightens attention to losses, but that greater attentionis given to loss frequency than loss value.

Cover page: Working Memory Affects Attention to Loss Value and Loss Frequency in Decision-Making under Uncertainty

Article
Peer Reviewed

Technical Advances to Accelerate Modular Type I Polyketide Synthase Engineering towards a Retro-biosynthetic Platform

UC Berkeley Previously Published Works (2019)

Modular type I polyketide synthases (PKSs) are multifunctional proteins that are comprised of individual domains organized into modules. These modules act together to assemble complex polyketides from acyl-CoA substrates in a linear fashion. This assembly-line enzymology makes engineered PKSs a potential retro-biosynthetic platform to produce fuels, commodity chemicals, speciality chemicals, and pharmaceuticals in various host microorganisms, including bacteria and fungi. However, the realization of this potential is restricted by practical difficulties in strain engineering, protein overexpression, and titer/yield optimization. These challenges are becoming more possible to overcome due to technical advances in PKS design, engineered heterologous hosts, DNA synthesis and assembly, PKS heterologous expression, and analytical methodology. In this review, we highlight these technical advances in PKS engineering and provide practical considerations thereof.

Cover page: Technical Advances to Accelerate Modular Type I Polyketide Synthase Engineering towards a Retro-biosynthetic Platform

Article
Peer Reviewed

Investigation of Indigoidine Synthetase Reveals a Conserved Active-Site Base Residue of Nonribosomal Peptide Synthetase Oxidases

UC Berkeley Previously Published Works (2020)

Nonribosomal peptide synthetase (NRPS) oxidase (Ox) domains oxidize protein-bound intermediates to install crucial structural motifs in bioactive natural products. The mechanism of this domain remains elusive. Here, by studying indigoidine synthetase, a single-module NRPS involved in the biosynthesis of indigoidine and several other bacterial secondary metabolites, we demonstrate that its Ox domain utilizes an active-site base residue, tyrosine 665, to deprotonate a protein-bound l-glutaminyl residue. We further validate the generality of this active-site residue among NRPS Ox domains. These findings not only resolve the biosynthetic pathway mediated by indigoidine synthetase but enable mechanistic insight into NRPS Ox domains.

Cover page: Investigation of Indigoidine Synthetase Reveals a Conserved Active-Site Base Residue of Nonribosomal Peptide Synthetase Oxidases

Article
Peer Reviewed

Lepidopteran mevalonate pathway optimization in Escherichia coli efficiently produces isoprenol analogs for next-generation biofuels

UC Berkeley Previously Published Works (2021)

Terpenes constitute the largest class of natural products with over 55,000 compounds with versatile applications including drugs and biofuels. Introducing structural modifications to terpenes through metabolic engineering is an efficient and sustainable way to improve their properties. Here, we report the optimization of the lepidopteran mevalonate (LMVA) pathway towards the efficient production of isopentenyl pyrophosphate (IPP) analogs as terpene precursors. First, we linked the LMVA pathway to NudB, a promiscuous phosphatase, resulting in the production of the six-carbon analog of 3-methyl-3-buten-1-ol (isoprenol), 3-ethyl-3-buten-1-ol (C6-isoprenol). Using C6-isoprenol as the final product, we then engineered the LMVA pathway by redirecting its upstream portion from a thiolase-dependent pathway to a beta-oxidation pathway. The beta-oxidation LMVA pathway transforms valeric acid, a platform chemical that can be produced from biomass, into C6-isoprenol at a titer of 110.3 mg/L, improved from 5.5 mg/L by the thiolase LMVA pathway, which used propionic acid as a feedstock. Knockout of the E. coli endogenous thiolase genes further improved the C6-isoprenol titer to 390 mg/L, implying efficient production of homo isopentenyl pyrophosphate (HIPP). The beta-oxidation LMVA-NudB pathway also converts butanoic acid and hexanoic acid into isoprenol and isoprenol's seven-carbon analog, 3-propyl-3-buten-1-ol (C7-isoprenol), respectively, suggesting the beta-oxidation LMVA pathway produces IPP and C7-IPP from the corresponding fatty acids. Fuel property tests revealed the longer chain isoprenol analogs have lower water solubilities, similar or higher energy densities, and comparable research octane number (RON) boosting effects to isopentenols. This work not only optimizes the LMVA pathway, setting the basis for homoterpene biosynthesis to expand terpene chemical space, but provides an efficient pathway to produce isoprenol analogs as next-generation biofuels from sustainable feedstocks.

Cover page: Lepidopteran mevalonate pathway optimization in Escherichia coli efficiently produces isoprenol analogs for next-generation biofuels

Creative Commons 'BY-NC' version 4.0 license

Article
Peer Reviewed

Biofuels for a sustainable future

UC Berkeley Previously Published Works (2021)

Rapid increases of energy consumption and human dependency on fossil fuels have led to the accumulation of greenhouse gases and consequently, climate change. As such, major efforts have been taken to develop, test, and adopt clean renewable fuel alternatives. Production of bioethanol and biodiesel from crops is well developed, while other feedstock resources and processes have also shown high potential to provide efficient and cost-effective alternatives, such as landfill and plastic waste conversion, algal photosynthesis, as well as electrochemical carbon fixation. In addition, the downstream microbial fermentation can be further engineered to not only increase the product yield but also expand the chemical space of biofuels through the rational design and fine-tuning of biosynthetic pathways toward the realization of "designer fuels" and diverse future applications.

Cover page: Biofuels for a sustainable future

Article
Peer Reviewed

Biochemical Characterization of β‐Amino Acid Incorporation in Fluvirucin B2 Biosynthesis

UC Berkeley Previously Published Works (2018)

Naturally occurring lactams, such as the polyketide-derived macrolactams, provide a diverse class of natural products that could enhance existing chemically produced lactams. Although β-amino acid loading in the fluvirucin B₂ polyketide pathway was proposed by a previously identified putative biosynthetic gene cluster, biochemical characterization of the complete loading enzymes has not been described. Here we elucidate the complete biosynthetic pathway of the β-amino acid loading pathway in fluvirucin B₂ biosynthesis. We demonstrate the promiscuity of the loading pathway to utilize a range of amino acids and further illustrate the ability to introduce non-native acyl transferases to selectively transfer β-amino acids onto a polyketide synthase (PKS) loading platform. The results presented here provide a detailed biochemical description of β-amino acid selection and will further aid in future efforts to develop engineered lactam-producing PKS platforms.

Cover page: Biochemical Characterization of β‐Amino Acid Incorporation in Fluvirucin B2 Biosynthesis

Article
Peer Reviewed

Evolution-guided engineering of small-molecule biosensors

UC Berkeley Previously Published Works (2020)

Allosteric transcription factors (aTFs) have proven widely applicable for biotechnology and synthetic biology as ligand-specific biosensors enabling real-time monitoring, selection and regulation of cellular metabolism. However, both the biosensor specificity and the correlation between ligand concentration and biosensor output signal, also known as the transfer function, often needs to be optimized before meeting application needs. Here, we present a versatile and high-throughput method to evolve prokaryotic aTF specificity and transfer functions in a eukaryote chassis, namely baker's yeast Saccharomyces cerevisiae. From a single round of mutagenesis of the effector-binding domain (EBD) coupled with various toggled selection regimes, we robustly select aTF variants of the cis,cis-muconic acid-inducible transcription factor BenM evolved for change in ligand specificity, increased dynamic output range, shifts in operational range, and a complete inversion-of-function from activation to repression. Importantly, by targeting only the EBD, the evolved biosensors display DNA-binding affinities similar to BenM, and are functional when ported back into a prokaryotic chassis. The developed platform technology thus leverages aTF evolvability for the development of new host-agnostic biosensors with user-defined small-molecule specificities and transfer functions.

Cover page: Evolution-guided engineering of small-molecule biosensors

Article
Peer Reviewed

Machine learning approach for obstructive sleep apnea screening using brain diffusion tensor imaging.

UCLA Previously Published Works (2023)

Patients with obstructive sleep apnea (OSA) show autonomic, mood, cognitive, and breathing dysfunctions that are linked to increased morbidity and mortality, which can be improved with early screening and intervention. The gold standard and other available methods for OSA diagnosis are complex, require whole-night data, and have significant wait periods that potentially delay intervention. Our aim was to examine whether using faster and less complicated machine learning models, including support vector machine (SVM) and random forest (RF), with brain diffusion tensor imaging (DTI) data can classify OSA from healthy controls. We collected two DTI series from 59 patients with OSA [age: 50.2 ± 9.9 years; body mass index (BMI): 31.5 ± 5.6 kg/m2 ; apnea-hypopnea index (AHI): 34.1 ± 21.2 events/h 23 female] and 96 controls (age: 51.8 ± 9.7 years; BMI: 26.2 ± 4.1 kg/m2 ; 51 female) using a 3.0-T magnetic resonance imaging scanner. Using DTI data, mean diffusivity maps were calculated from each series, realigned and averaged, normalised to a common space, and used to conduct cross-validation for model training and selection and to predict OSA. The RF model showed 0.73 OSA and controls classification accuracy and 0.85 area under the curve (AUC) value on the receiver-operator curve. Cross-validation showed the RF model with comparable fitting over SVM for OSA and control data (SVM; accuracy, 0.77; AUC, 0.84). The RF ML model performs similar to SVM, indicating the comparable statistical fitness to DTI data. The findings indicate that RF model has similar AUC and accuracy over SVM, and either model can be used as a faster OSA screening tool for subjects having brain DTI data.

Cover page: Machine learning approach for obstructive sleep apnea screening using brain diffusion tensor imaging.

Article
Peer Reviewed

Cover Feature: Biochemical Characterization of β‐Amino Acid Incorporation in Fluvirucin B2 Biosynthesis (ChemBioChem 13/2018)

UC Berkeley Previously Published Works (2018)