 Main
Deep EnergyBased Generative Modeling and Learning
 Xu, Yifei
 Advisor(s): Wu, Ying Nian
Abstract
Generative model, as an unsupervised learning approach, is a promising development for learning meaningful representations without focusing on specific tasks. Finding such generative models is one of the most fundamental problems in both statistics, computer vision, and artificial intelligence research. The deep energybased model (EBM) is one of the most promising candidates. Previous works have proven the capability of EBM on image domains. In this dissertation, we explore the capability of EBM in three important domains: unordered set modeling, 3D shape representation, and continuous inverse optimal control. For each domain, we proposed a novel approach using EBM and got substantial competitive results.
Originated from statistical physics, EBM directly defines a probability density that is an exponential of the negative energy function, where the energy function maps the input variable to an energy scalar. Training an EBM from observed data entails finding an energy function, where observed data are assigned lower energies than unobserved ones. Given the observed training data, EBM are trained by maximum likelihood estimation, which leads to an ``analysis by synthesis'' algorithm. The training process iterates the following two steps: (1) Synthesis step: sample the data from the current probability distribution using the Markov chain Monte Carlo (MCMC) method. (2) Analysis step: update the model parameters based on the statistical difference between the synthesized data and the observed data. Compared other commonly used generative models, such as Generative Adversarial Network (GAN) or Variational Autoencoder (VAE), EBM is appealing because (1) EBM provides an explicit density function for the data; (2) training EBM does not rely on any auxiliary models; (3) Training EBM does not suffer from mode collapse; (4) EBM unifies the representation and generation in a single framework.
We first propose EBM on unordered set data, such as point clouds which are widely used in 3D shape representation. In this part, we propose a generative model of unordered point sets in the form of an EBM, where the energy function is parameterized by an inputpermutationinvariant bottomup neural network. The energy function learns a coordinate encoding of each point and then aggregates all individual point features into energy for the whole point cloud. We call our model the Generative PointNet because it can be derived from the discriminative PointNet. Our model can be trained by MCMCbased maximum likelihood learning (as well as its variants), without the help of any assisting networks like those in GANs and VAEs. Unlike most point cloud generators that rely on handcrafted distance metrics, our model does not require any handcrafted distance metric for the point cloud generation, because it synthesizes point clouds by matching observed examples in terms of statistical properties defined by the energy function. Furthermore, we can learn a shortrun MCMC towards EBM as a flowlike generator for point cloud reconstruction and interpolation. The learned point cloud representation can be useful for point cloud classification and segmentation.
We then design a novel shape implicit representation based on EBM. The implicit representation, which uses a function to represent a 3D shape, shows great performance in the 3D graphic field. Unlike previous work which required a humandefined function, we proposed the energybased implicit function defined as a natural representation of the probability that a point is on the surface. The energybased implicit function learned a probability distribution for points over the 3D space. With the introduction of conditional latent code, a deep neural network approximated energy function can represent multiple objects. We use importance sampling and maximum likelihood estimation to learn this network. Our training procedure does not require extra humandefined loss functions and sample points which are not on the surface. Furthermore, we combined this energybased implicit function with variation autoencoder for improved capacity in generation.
At last, we focus on the problem of continuous inverse optimal control (over finite time horizon) by learning the unknown cost function over the sequence of continuous control variables from expert demonstrations. In this part, we study this fundamental problem in the framework of EBM, where the observed expert trajectories are assumed to be random samples from a probability density function defined as the exponential of the negative cost function up to a normalizing constant. The parameters of the cost function are learned by maximum likelihood via an ``analysis by synthesis'' scheme, which iterates (1) synthesis step: sample the synthesized trajectories from the current probability density using the Langevin dynamics via backpropagation through time, and (2) analysis step: update the model parameters based on the statistical difference between the synthesized trajectories and the observed trajectories. Given the fact that an efficient optimization algorithm is usually available for an optimal control problem, we also consider a convenient approximation of the above learning method, where we replace the sampling in the synthesis step by optimization. Moreover, to make the sampling or optimization more efficient, we propose a method to train EBM simultaneously with a topdown trajectory generator via cooperative learning, where the trajectory generator is used to fast initialize the synthesis step of EBM. We demonstrate that the proposed methods work well on autonomous driving tasks and show that they can learn suitable cost functions for optimal control.
Main Content
Enter the password to open this PDF file:













