Learning Descriptive and Generative Models with Short-Run MCMC
Open Access Publications from the University of California

## Learning Descriptive and Generative Models with Short-Run MCMC

• Author(s): Nijkamp, Erik Lennart
What is vision? The mystery of how the visual cortex extracts abstract concepts from a plethora of visual sensory stimuli has captivated pioneers such as Herrmann von Helmholtz and David Marr for the past century. \textit{Helmholtz} states, what we see is the solution to a computational problem; our brains compute the most likely causes for the photon absorptions within our eyes. In his monumental work Vision'', \textit{Marr} conceptualizes the process of vision as a set of representations, starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment. \textit{David Bryant Mumford} proposes hierarchical Bayesian inference as a means to understand the visual cortex. In the context of predictive coding theory, Mumford argues that the function of the hierarchical structure in the cortex is to reconcile representations and predictions of sensory stimuli at multiple levels. The assumption is that the dynamics of neural activity is guided towards minimizing the discrepancy or error between the input representation at each level and the prediction originating from a higher-level representation. \textit{Song-Chun Zhu} and \textit{Ying Nian Wu} propose a holistic realization of Marr's paradigm with rigorous statistical modeling in their work Computer Vision - Statistical Models for Marr's Paradigm''.
We follow in these footsteps towards a realization of Marr's paradigm and frame vision as the problem of Bayesian posterior inference in a latent-variable model. Following predictive coding theory, we believe that higher-level representations emerge from a reconstruction of the sensory stimuli as an inference process in a top-down model, for which inference may be amortized in a bottom-up model. Notably, the posterior inference is in the form of a Markov chain Monte Carlo (MCMC) sampling process which maintains a set of most probably candidate solutions. This approach allows to naturally explain observations such as resolving ambiguity in poorly handwritten text and explains phenomena such as hallucination. In this sense, seeing'' itself is merely an illusion. The dominance of top-down processing within the cortex is not only supported by observations such as these, but also by findings in neuroscience concerning the structure of the cortex.