Learning Single-view 3D Reconstruction of Objects and Scenes
- Author(s): Tulsiani, Shubham
- Advisor(s): Malik, Jitendra
- et al.
We address the task of inferring the 3D structure underlying an image, in particular focusing on two questions -- how we can plausibly obtain supervisory signal for this task, and what forms of representation should we pursue. We first show that we can leverage image-based supervision to learn single-view 3D prediction, by using geometry as a bridge between the learning systems and the available indirect supervision. We demonstrate that this approach enables learning 3D structure across diverse setups e.g. learning deformable models, predctive models for volumetric 3D, or inferring textured meshes. We then advocate the case for inferring interpretable and compositional 3D representations. We present a method that discovers the coherent compositional structure across objects in a unsupervised manner by attempting to assemble shapes using volumetric primitives, and then demonstrate the advantages of predicting similar factored 3D representations for complex scenes.