As humans, we have a remarkable ability of telling objects apart from cluttered background
and tracing their contours even with occlusions. This ability has long fascinated computer vision researchers to study the principles and algorithms for object segmentation. Object segmentation has both theoretical and practical interests as it is an essential step towards 3D image understanding and intelligent image editing.
To segment an object, we have to recognize it in order to obtain knowledge of what parts should be grouped together. In this thesis, we formulate object segmentation as an image labeling problem in random field models to facilitate integrating top-down recognition knowledge with bottom-up image cues. The integration can be driven by either bottom-up segmentation or top-down recognition. The segmentation-driven process requires object-level segmentation hypotheses drawn from bottom-up cues while the recognition-driven process needs shape and context to be effectively represented. This thesis addresses these issues in a data-driven approach. First, we propose to generate object segmentation proposals from segmentation trees using exemplars. Compared to previous parametric methods, our data-driven method takes advantage of both diversity and informativeness
of exemplars and thus produce a compact set of highly plausible proposals. Second, we
propose novel random field models that enjoy joint learning of shape representation and
object segmentation. Different from previous works that use shape representation as prior,
our model emphasizes the structured prediction from the recognition model to the shape
model. This difference ensures the the shape is well preserved in the resulting segmentation
masks with robustness to partial occlusions. Third, we develop a novel nonparametric
method based on multiscale shape transfer, which in turns forms a higher-order random
field. Compared to previous works that transfer rigid or deformable masks in image subwindows, our method explores shape masks in multiple granularities and is able to produce high quality segmentations in an efficient way. The last but not least, we develop a novel scene parsing system where small objects are segmented in context. With extensive use of context in multiscale and particular care to the long-tailed label distribution, our system
demonstrates state-of-the-art results in large-scale problems.