Learning Representation for Scene Understanding: Epitomes, CRFs, and CNNs
- Author(s): Chen, Liang-Chieh
- Advisor(s): Yuille, Alan L.
- et al.
Scene understanding, such as image classification and semantic image segmentation, has been a challenging problem in computer vision. The difficulties mainly come from the feature representation, i.e., how to find a good representation for images. Instead of improving over hand-crafted features such as SIFT or HoG, we focus on learning image representations by generative and discriminative methods.
In this thesis, we explore three areas: (1) generative models, (2) graphical models, and (3) deep neural networks for learning image representations. In particular, we propose a dictionary of epitomes, a compact generative representation for explicitly modeling object co-relation within edge patches, and for explicitly modeling photometric and position variability of image patches. Subsequently, we exploit Conditional Random Fields (CRFs) to take into account the dependencies between outputs. Finally, we employ Deep Convolutional Neural Networks trained with large-scale datasets to learn feature representations. We further combine CRFs with deep networks to estimate complex representations. Specifically, We show that our proposed model can achieve state-of-art performance on challenging semantic image segmentation benchmarks.