 Main
Effective Learning of Descriptive and Generator Models and Learning Representations for Grid Cells and V1 Cells
 Gao, Ruiqi
 Advisor(s): Zhu, SongChun
Abstract
In recent decades, deep learning has achieved tremendous successes in supervised learning; however, unsupervised learning and representation learning, i.e., learning the hidden structure of the data without requiring expensive and timeconsuming human annotation, remains a fundamental challenge, which probably underlies the gap between current artificial intelligence and the intelligence of a biological brain. In this thesis, we propose novel solutions to the problems in this area. Specifically, we work on deep generative modeling, an important approach of unsupervised learning, and representation learning inspired by structures in the brain.
1. We propose efficient algorithms for learning descriptive models, which are also known as energybased models (EBMs). Despite an appealing class of generative models with a number of desirable properties, the learning of descriptive models on highdimensional space remains challenging, which involves computationally expensive Markov chain Monte Carlo (MCMC). To tackle this problem, we propose a multigrid modeling and sampling method, which learns descriptive models at multiple scales or resolutions and the MCMC sampling follows a coarsetofine scheme. This approach enables efficient learning and sampling of descriptive models from largescale image datasets with smallbudget MCMC. Later on, we extend this method to an improved version named diffusion recovery likelihood, where a sequence of descriptive models are proposed and learned on increasingly noisy versions of a dataset. Each descriptive model is trained by sampling from the conditional probability of the data at a certain noise level given their noisy versions at a higher noise level, which further releases the burden of MCMC. 2. We develop dynamic and motionbased generator models which learn semantically meaningful vector representations for spatialtemporal processes such as dynamic textures and action sequences in video data. The models are capable of learning disentangled representations of appearance, trackable motion and intrackable motion in spatialtemporal processes in a fully unsupervised manner. We also propose an efficient learning algorithm named alternating backpropagation through time, which learns the proposed models using online MCMC inference without resorting to auxiliary networks.
3. We propose hybrid generative models that integrate the advantages of different classes of generative models. Specifically, we propose a training algorithm flow contrastive estimation to jointly estimate a descriptive model and a flowbased model, in which the two models are iteratively updated based on a shared adversarial value function. The algorithm is an extension of noise contrastive estimation (NCE) and combines the flexibility of descriptive models and the tractability of flowbased models. We also study another hybrid model where the descriptive model serves as a correction or an exponential tilting of the flowbased model. We show that this model has a particularly simple form in the space of the latent variables of the flowbased model, and MCMC sampling of the descriptive model in the latent space mixes well and traverses modes in the data space.
4. We propose an optimizationbased representational model of grid cells. Grid cells exist in the mammalian medial entorhinal cortex (mEC) and are so named because individual neurons exhibit striking firing patterns that form hexagonal grids when the agent (such as a rat) navigates in a 2D open field. To understand how grid cells perform path integration, we conduct theoretical analysis of a general representational model of grid cells where the 2D selfposition of the agent is represented by a higherdimensional vector and the 2D selfmotion is represented by a general transformation of the vector. We identify two conditions for the general transformation and demonstrate an important geometric property of the general transformation, i.e., local conformal embedding. We further investigate the simplest transformation, i.e., the linear transformation, and uncover its explicit algebraic and geometric structure as a matrix Lie group of rotation. The model learns significant hexagon patterns of grid cells and is capable of accurate path integration.
5. We extend the representational model of grid cells to an optimizationbased representational model of V1 simple cells. V1 stands for the primary visual cortex in the mammalian brain, and V1 simple cells are highly specialized for lowlevel motion perception and pattern recognition. We propose a representational model of V1 simple cells which couples the following two components: (1) the vector representations of local contents of images and (2) the matrix representations of local pixel displacements caused by the relative motions between the agent and the objects in the 3D scene. The model can learn Gaborlike tunings of V1 simple cells and similar to V1 simple cells, the learned adjacent neurons have quadraturephase relations.
Main Content
Enter the password to open this PDF file:













