Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Perceptual inference in generative models

Abstract

We see and hear so freely that to the casual observer it is not obvious that perception would be such a difficult problem for modern science to understand. David Marr suggested that an understanding of perception requires analyzing the problems it solves along with the assumptions necessary for a solution. In this thesis I maintain that generative probabilistic models are a powerful tool to implement Marr's approach. In generative models one has to explitly encode the assumptions and goals of perceptual problems, whereas specific knowledge of the world is gleaned from the sensory data by learning within the model. This thesis explores the use of generative models for understanding perception in audio- visual systems as well as in the individual modalities. The cocktail-party problem of single-channel sound separation is addressed using competing sound models, including a novel factorial model that unites pitch tracking and formant-based models. A convolutional hidden Markov model for video tracking performs exact inference in maps of object location, using a novel technique to make this inference tractable for extremely large hypothesis spaces. A model of non-rigid 3D tracking is presented in which some simple assumptions unify template matching and optic flow under the same framework. Finally, an audio-visual model brings together aspects of each of these models to exploit cross-modal information for speech enhancement. Along the way, key benefits of generative modeling, such as the flexibility of inference, the "explaining-away" phenomenon, and the "problem-level" formulation of the models, are introduced and discussed in light of the research presented

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View