Generative model, as an unsupervised learning approach, is a promising development for learning meaningful representations without focusing on specific tasks. Finding such generative models is one of the most fundamental problems in both statistics, computer vision, and artificial intelligence research. The deep energy-based model (EBM) is one of the most promising candidates. Previous works have proven the capability of EBM on image domains. In this dissertation, we explore the capability of EBM in three important domains: unordered set modeling, 3D shape representation, and continuous inverse optimal control. For each domain, we proposed a novel approach using EBM and got substantial competitive results.
Originated from statistical physics, EBM directly defines a probability density that is an exponential of the negative energy function, where the energy function maps the input variable to an energy scalar. Training an EBM from observed data entails finding an energy function, where observed data are assigned lower energies than unobserved ones. Given the observed training data, EBM are trained by maximum likelihood estimation, which leads to an ``analysis by synthesis'' algorithm. The training process iterates the following two steps: (1) Synthesis step: sample the data from the current probability distribution using the Markov chain Monte Carlo (MCMC) method. (2) Analysis step: update the model parameters based on the statistical difference between the synthesized data and the observed data. Compared other commonly used generative models, such as Generative Adversarial Network (GAN) or Variational Auto-encoder (VAE), EBM is appealing because (1) EBM provides an explicit density function for the data; (2) training EBM does not rely on any auxiliary models; (3) Training EBM does not suffer from mode collapse; (4) EBM unifies the representation and generation in a single framework.
We first propose EBM on unordered set data, such as point clouds which are widely used in 3D shape representation. In this part, we propose a generative model of unordered point sets in the form of an EBM, where the energy function is parameterized by an input-permutation-invariant bottom-up neural network. The energy function learns a coordinate encoding of each point and then aggregates all individual point features into energy for the whole point cloud. We call our model the Generative PointNet because it can be derived from the discriminative PointNet. Our model can be trained by MCMC-based maximum likelihood learning (as well as its variants), without the help of any assisting networks like those in GANs and VAEs. Unlike most point cloud generators that rely on hand-crafted distance metrics, our model does not require any hand-crafted distance metric for the point cloud generation, because it synthesizes point clouds by matching observed examples in terms of statistical properties defined by the energy function. Furthermore, we can learn a short-run MCMC towards EBM as a flow-like generator for point cloud reconstruction and interpolation. The learned point cloud representation can be useful for point cloud classification and segmentation.
We then design a novel shape implicit representation based on EBM. The implicit representation, which uses a function to represent a 3D shape, shows great performance in the 3D graphic field. Unlike previous work which required a human-defined function, we proposed the energy-based implicit function defined as a natural representation of the probability that a point is on the surface. The energy-based implicit function learned a probability distribution for points over the 3D space. With the introduction of conditional latent code, a deep neural network approximated energy function can represent multiple objects. We use importance sampling and maximum likelihood estimation to learn this network. Our training procedure does not require extra human-defined loss functions and sample points which are not on the surface. Furthermore, we combined this energy-based implicit function with variation auto-encoder for improved capacity in generation.
At last, we focus on the problem of continuous inverse optimal control (over finite time horizon) by learning the unknown cost function over the sequence of continuous control variables from expert demonstrations. In this part, we study this fundamental problem in the framework of EBM, where the observed expert trajectories are assumed to be random samples from a probability density function defined as the exponential of the negative cost function up to a normalizing constant. The parameters of the cost function are learned by maximum likelihood via an ``analysis by synthesis'' scheme, which iterates (1) synthesis step: sample the synthesized trajectories from the current probability density using the Langevin dynamics via back-propagation through time, and (2) analysis step: update the model parameters based on the statistical difference between the synthesized trajectories and the observed trajectories. Given the fact that an efficient optimization algorithm is usually available for an optimal control problem, we also consider a convenient approximation of the above learning method, where we replace the sampling in the synthesis step by optimization. Moreover, to make the sampling or optimization more efficient, we propose a method to train EBM simultaneously with a top-down trajectory generator via cooperative learning, where the trajectory generator is used to fast initialize the synthesis step of EBM. We demonstrate that the proposed methods work well on autonomous driving tasks and show that they can learn suitable cost functions for optimal control.