Occlusions and Their Role in Object Detection in Video
Occlusions and disocclusions are essential cues for human perception in understanding the layout of a scene. By analyzing how some parts of the scene go out of the sight (occluded) and new parts appear (disoccluded), one can infer the topology of the objects in it. Since the scene geometry and its dynamics induce this phenomena, they are fundamental cues in computer vision and video processing tasks such as visual exploration, object recognition, activity recognition, tracking and video compression.
In this thesis, we first introduce three methods to detect occlusions in an image sequence: (1) a motion segmentation algorithm which partitions an oversegmented image into two parts: a region on which optical flow is expressed with a piecewise-constant field and occluded regions where flow is not defined, (2) an optical flow estimation method which additionally detects occlusions modeling them as sparse subset of the image domain, and (3) a saliency detection algorithm which detects the parts of the image domain whose motion is inconsistent with the camera motion. In the second part of the thesis, we show that the problem of object detection in a video can be cast as an unsupervised segmentation scheme using occlusion cues and solved using convex optimization for an unknown number and geometry of objects in the scene. We further extend this approach by incorporating semantic priors for object categories that are learned from object recognition datasets. This enables the detection algorithm to segment and categorize objects jointly.