Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Learning Representation for Scene Understanding: Epitomes, CRFs, and CNNs

Abstract

Scene understanding, such as image classification and semantic image segmentation, has been a challenging problem in computer vision. The difficulties mainly come from the feature representation, i.e., how to find a good representation for images. Instead of improving over hand-crafted features such as SIFT or HoG, we focus on learning image representations by generative and discriminative methods.

In this thesis, we explore three areas: (1) generative models, (2) graphical models, and (3) deep neural networks for learning image representations. In particular, we propose a dictionary of epitomes, a compact generative representation for explicitly modeling object co-relation within edge patches, and for explicitly modeling photometric and position variability of image patches. Subsequently, we exploit Conditional Random Fields (CRFs) to take into account the dependencies between outputs. Finally, we employ Deep Convolutional Neural Networks trained with large-scale datasets to learn feature representations. We further combine CRFs with deep networks to estimate complex representations. Specifically, We show that our proposed model can achieve state-of-art performance on challenging semantic image segmentation benchmarks.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View