Search

Scholarly Works (40 results)

Sort By:

Show:

Article

The Convergence of Contrastive Divergences

Yuille, Alan L

Department of Statistics Papers (2006)

This paper analyses the Contrastive Divergence algorithm for learning statistical parameters. We relate the algorithm to the stochastic approximation literature. This enables us to specify conditions under which the algorithm is guaranteed to converge to the optimal solution (with probability 1). This includes necessary and sufficient conditions for the solution to be unbiased.

Cover page: The Convergence of Contrastive Divergences

Article

Augmented Rescorla-Wagner and Maximum Likelihood Estimation

Yuille, Alan L

Department of Statistics Papers (2005)

We show that linear generalizations of Rescorla-Wagner can perform Maximum Likelihood estimation of the parameters of all generative models for causal reasoning. Our approach involves augmenting variables to deal with conjunctions of causes, similar to the agumented model of Rescorla. Our results involve genericity assumptions on the distributions of causes. If these assumptions are violated, for example for the Cheng causal power theory, then we show that a linear Rescorla-Wagner can estimate the parameters of the model up to a nonlinear transformtion. Moreover, a nonlinear Rescorla-Wagner is able to estimate the parameters directly to within arbitrary accuracy. Previous results can be used to determine convergence and to estimate convergence rates

Cover page: Augmented Rescorla-Wagner and Maximum Likelihood Estimation

Article

The Rescorla-Wagner Algorithm and Maximum Likelihood Estimation of Causal Parameters

Yuille, Alan L

Department of Statistics Papers (2005)

This paper analyzes generalization of the classic Rescorla-Wagner (R- W) learning algorithm and studies their relationship to Maximum Like- lihood estimation of causal parameters. We prove that the parameters of two popular causal models, ?P and P C , can be learnt by the same generalized linear Rescorla-Wagner (GLRW) algorithm provided gener- icity conditions apply. We characterize the fixed points of these GLRW algorithms and calculate the fluctuations about them, assuming that the input is a set of i.i.d. samples from a fixed (unknown) distribution. We describe how to determine convergence conditions and calculate conver- gence rates for the GLRW algorithms under these conditions.

Cover page: The Rescorla-Wagner Algorithm and Maximum Likelihood Estimation of Causal Parameters

Thesis
Peer Reviewed

Multimodal Learning for Vision and Language

Mao, Junhua
Advisor(s): Yuille, Alan L

UCLA Electronic Theses and Dissertations (2017)

This thesis focuses on proposing and addressing various tasks in the field of vision and language, a new and challenging area which contains the hottest research topics for both computer vision and natural language processing. We first proposed an effective RNN-CNN framework (Recurrent Neural Network-Convolutional Neural Network) to address the task of image captioning (i.e. describing an image with a sentence). Based on this work, we proposed effective models and constructed large-scale datasets, for various vision and language tasks, such as unambiguous object descriptions (i.e. Referring expressions), image question answering, one-shot novel concept captioning, multimodal word embedding, and multi-label classification. Many of these tasks have not been successfully addressed or even been investigated before. Our work are among the first deep learning effort for these tasks, and achieves the state-of-the-art results. We hope the methods and datasets proposed in this thesis could provide insight for the future development of vision and language.

Cover page: Multimodal Learning for Vision and Language

Article

Bayesian Models of Object Pperception

Department of Statistics Papers (2003)

The human visual system is the most complex pattern recognition device known. In ways that are yet to be fully understood, the visual cortex arrives at a simple and unambiguous interpretation of data from the retinal image that is useful for the decisions and actions of everyday life. Recent advances in Bayesian models of computer vision and in the measurement and modeling of natural image statistics are providing the tools to test and constrain theories of human object perception. In turn, these theories are having an impact on the interpretation of cortical function.

Cover page: Bayesian Models of Object Pperception

Thesis
Peer Reviewed

Generating Human Images and Ground Truth using Computer Graphics

Qiu, Weichao
Advisor(s): Yuille, Alan L.

UCLA Electronic Theses and Dissertations (2016)

How to provide high quality data for computer vision is a challenge. Researchers spent a lot of effort creating image datasets with more images and more detailed annotation. Computer graphics (CG) is a way of creating synthetic images, during the image synthesis many types of information of the CG scene can be exported as ground truth annotation. In this paper, we develop a pipeline to synthesize realistic human images and automatically generate detailed annotation at the same time. We use 2D annotation to control the pose of the CG human model, which enables our images to contain more poses than motion capture based method. The synthetic images are used to train and evaluate human pose estimation algorithm to show its usefulness.

Cover page: Generating Human Images and Ground Truth using Computer Graphics

Thesis
Peer Reviewed

Unsupervised Learning of Object Descriptors and Compositions

Ye, Xingyao
Advisor(s): Yuille, Alan L

UCLA Electronic Theses and Dissertations (2012)

This thesis presents methods and results to solve the problem of joint object recognition and reconstruction. The proposed solution is a dictionary of deformable image patches and a hierarchical model encoding spatial compositions. Both the dictionary and the composition model are learned from data without supervision. The patch dictionary is shown to achieve state-of-art performance on digit recognition while capable of high-quality reconstruction. The hierarchical model is shown to account for human chunk learning behavior not captured by previous theories. Both learning algorithms are significantly faster and easier to use than previous methods of similar purpose.

Cover page: Unsupervised Learning of Object Descriptors and Compositions

Thesis
Peer Reviewed

Towards Detecting and Describing Objects: Object Detection, Parsing and Human Pose Estimation

Chen, Xianjie
Advisor(s): Yuille, Alan L

UCLA Electronic Theses and Dissertations (2016)

Detecting and describing objects is one of the fundamental challenges in computer vision. Teaching computers to find and parse objects in the images is an interesting artificial intelligence problem in its own right, and a working technique also has enormous potential to benefit a lot of other computer vision tasks. In this thesis, we focus on three highly correlated tasks for detecting and describing objects, i.e., object detection, object parsing and human pose estimation, and propose a series of novel methods for these tasks.

The first step to recognize an object is arguably to localize it. We start from studying the role of context for object detection and semantic segmentation in the wild. Towards this goal, we label every pixel of PASCAL VOC 2010 detection challenge with a semantic category, and propose a novel deformable part-based contextual reasoning method. We show that this method significantly helps in detecting objects.

Parsing objects into semantic body parts is important for understanding them further. We propose novel graphical model based approaches to describe objects in terms of the semantic body parts. In order to study different representation learning methods, we also study an end-to-end Deep Convolutional Neural Networks (DCNNs) based method to model the relationship between object and body parts in a holistic manner. For training and evaluating our methods, we provide fully annotated object parts for PASCAL VOC 2010.

Human is one of the most important objects. It is crucial to teach computers to understand different poses of human. We present a method for estimating human pose based on a graphical model with novel pairwise relations that make adaptive use of local image measurements. We make novel use of the DCNNs and combine their statistical power with the representational flexibility of graphical models. To parse humans when there is significant occlusion. We further propose a novel method for learning occlusion cues, and exploit the fact that occlusions often occur in regular patterns. We evaluate these models on popular benchmark datasets and show significant performance improvements over the state of the arts.

Cover page: Towards Detecting and Describing Objects: Object Detection, Parsing and Human Pose Estimation

Article

A Hierarchical Compositional System for Rapid Object Detection

Department of Statistics Papers (2006)

We describe a hierarchical compositional system for detecting de- formable objects in images. Objects are represented by graphical models. The algorithm uses a hierarchical tree where the root of the tree corre- sponds to the full object and lower-level elements of the tree correspond to simpler features. The algorithm proceeds by passing simple messages up and down the tree. The method works rapidly, in under a second, on 320 × 240 images. We demonstrate the approach on detecting cat- s, horses, and hands. The method works in the presence of background clutter and occlusions. Our approach is contrasted with more traditional methods such as dynamic programming and belief propagation.

Cover page: A Hierarchical Compositional System for Rapid Object Detection

Article

Ideal Observers for Detecting Motion: Correspondence Noise

Department of Statistics Papers (2005)

We derive a Bayesian Ideal Observer (BIO) for detecting motion and solving the correspondence problem. We obtain Barlow and Tripathy’s classic model as an approximation. Our psychophysical experiments show that the trends of human performance are similar to the Bayesian Ideal, but overall human performance is far worse. We investigate ways to degrade the Bayesian Ideal but show that even extreme degradations do not approach human performance. Instead we propose that humans perform motion tasks using generic, general purpose, models of motion. We perform more psychophysical experiments which are consistent with humans using a Slow-and-Smooth model and which rule out an alterna- tive model using Slowness.

Cover page: Ideal Observers for Detecting Motion: Correspondence Noise