Leveraging Occlusion Cues for Causal Video Object Segmentation
- Author(s): Taylor, Brian
- Advisor(s): Soatto, Stefano
- et al.
This thesis describes a framework leveraging occlusions as a cue for detecting objects and accurately localizing their boundaries throughout the course of a video. Triggered by the motion of objects in the scene, occlusions provide coarse knowledge of the spatial relationship of objects with respect to the viewer. While effective for detecting objects when motion is sufficient, we explore ways to reliably detect and track objects when motion is inadequate or difficult to estimate.
In the first half, we incorporate semantic classifiers to provide cues when occlusions are weak, and observe occlusion and appearance information to be mutually beneficial, yielding results more resilient to failures of the component systems acting alone. Our system is evaluated on the semantic segmentation task. In the latter half, we drop semantics and instead devise a causal framework integrating segmentation results and occlusion cues from frames processed in the past. So long as objects move sufficiently with respect to the viewer at some point, they will be detected and subsequently tracked for the rest of the video. We evaluated our approach on the video object segmentation problem. The resulting system has the capability to automatically discover objects from occlusions in video and track their shapes as they evolve over time. Coarse depth is provided as a byproduct and the assignment of semantic category labels can be integrated in a natural way.