 Main
Coupling and Learning Hierarchical Generative and Descriptive Models for Image Synthesis and Analysis
 Author(s): Lu, Yang
 Advisor(s): Zhu, SongChun
 et al.
Abstract
Learning a generative model with compositional structure is a fundamental problem in statistics. My thesis generalizes two major classical statistical models by introducing the Convolutional Neural Networks (ConvNets): (1) \textit{Exponential family model}, which is generalized to a descriptor model by a bottomup ConvNet. (2) \textit{Latent factor model}, which is generalized to a generator model by a topdown ConvNet.
The probability distribution of descriptor is in the form of exponentially tiling of a reference distribution. The descriptor can be derived directly from the discriminative ConvNet. Assuming rectified linear units and Gaussian white noise reference distribution, the descriptor contains a representational structure with multiple layers of binary activation variables, which reconstruct the mean of the Gaussian piece. The model is learned by Maximum Likelihood Estimation (MLE). The Langevin dynamics for data synthesis is driven by reconstruction error, and the corresponding gradient descent dynamics converges to a local energy minimum that is autoencoding.
The probability distribution of generator is in the form of Multivariate Gaussian, where the mean is computed by a nonlinear ConvNet mapping function on latent factors. The model is learned by an alternating backpropagation algorithm, which underlies the famous Expectationmaximization (EM) algorithm. The alternating backpropagation iterates the following two steps. (a) \textit{Inferential backpropagation}, which infers the latent factors by Langevin dynamics or gradient descent. (b) \textit{Learning backpropagation}, which updates the parameters given
the inferred latent factors by gradient descent. The gradient computations in both steps are powered by backpropagation, and they share most of their code in common.
The learning algorithms of two models can be interwoven into a cooperative training algorithm, where the generator model generates synthesized examples to jumpstart the Markov Chain Monte Carlo (MCMC) sampling of the descriptor model to fuel the learning of the descriptor model. The generator model can then learn from how the descriptor’s MCMC revises the synthesized examples generated by the generator, and the learning is supervised because the latent factors are known.
The experiment results show that the two models can generate realistic images, audios and dynamic patterns. Moreover, the generator can also be used to learn from incomplete or indirect training data. Generator and cooperative training algorithm can outperform generative adversarial network (GAN) and variational autoencoder (VAE) in data recovery and incompletion tasks.
Main Content
Enter the password to open this PDF file:













