Li, Xueting

Learning Visual Correspondences across Instances and Video Frames

2021

Li, Xueting
Advisor(s): Yang, Ming-Hsuan

Abstract

Correspondence is ubiquitous in our visual world. It describes the relationship of two images by pointing out which parts in one image relate to which parts in the other image. It is the fundamental task in many computer vision applications. For instance, object tracking essentially studies the correspondence of different parts on the same object through time, while semantic segmentation links the same semantic parts of different objects through space. Furthermore, the study of correspondence facilitates many applications such as structure from motion or label propagation through video frames. However, correspondence annotation is notoriously hard to harvest. Existing work either utilize synthesized data (e.g., optical flow from a game engine) or other human annotations (e.g., semantic segmentation), leading to domain limitation or tedious human efforts. My research focuses on learning and applying correspondence in computer vision tasks in a self-supervised manner to resolve these limitations. I start by introducing a method that learns reliable dense correspondence from videos in a self-supervised manner. Next, I discuss two methods that utilize correspondence between images or video frames to facilitate 3D mesh reconstruction. To begin with, I present a work that learns a self-supervised, single-view 3D reconstruction model that predicts the 3D mesh shape, texture, and camera pose of a target object with a collection of 2D images and silhouettes. Then, based on the two methods discussed above, the intuitive question is that can we combine the correspondence learned in the first work and the mesh reconstruction model in the second work to solve mesh reconstruction from video frames? Thus, in the last work, I demonstrate an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Merced

Learning Visual Correspondences across Instances and Video Frames