Skip to main content
eScholarship
Open Access Publications from the University of California

Analyzing 3D Objects in 2D Images

  • Author(s): Hejrati, seyed mohammadmohsen
  • Advisor(s): Ramanan, Deva
  • et al.
Creative Commons 'BY' version 4.0 license
Abstract

Robots are mechanically capable of doing many tasks, carrying loads, precisely manipulating objects, picking and packing or collaborating with humans. However, they require accurate 3D perception of objects and surrounding environment to do these tasks autonomously. Traditional methods build 3D representation of the scene using structure from motion techniques or depth sensors, while more recent approaches use statistical models to learn geometry and appearance of 3D objects and scenes. This thesis investigates approaches to represent, learn and analyze 3D objects in natural images. We first propose two new methods for 3D object recognition and pose estimation in single 2D images. Second, we study various geometric representations for the novel task of primitive 3D shape categorization.

We propose two novel approaches for recognizing 3D objects: (1) {\em Aligning a 3D model} to detected 2D landmarks, where we propose a novel method based on deformable-part models to propose candidate detections and 2D estimates of shape, then these estimates are refined by using an explicit 3D model of shape and viewpoint. (2) An {\em analysis by synthesis} approach where a forward synthesis model constructs possible geometric interpretations of the world, and then selects the interpretation that best agrees with the measured visual evidence. We show state of the art performance for detection and pose estimation on two challenging 3D object recognition datasets of cars and cuboids.

3D object recognition methods focus on modeling 3D shape of the objects, however, many objects may have similar 3D shape (washing machines, cabinets and microwave are all cuboidal), thus recognizing them require reasoning about appearance and geometry at the same time. The natural approach for recognition might extract pose-normalized appearance features. Though such approaches are extraordinarily common in the literature, in this thesis we demonstrate that they are {\em not optimal}. Instead, we introduce methods based on pose-synthesis, a somewhat simple approach of augmenting training data with geometrically perturbed training samples. We demonstrate that synthesis is a surprisingly simple but effective strategy that allows for state-of-the-art categorization and automatic 3D alignment.

Main Content
Current View