Skip to main content
eScholarship
Open Access Publications from the University of California

Visual Representations for Fine-grained Categorization

  • Author(s): Zhang, Ning
  • Advisor(s): Darrell, Trevor
  • et al.
Abstract

In contrast to basic-level object recognition, fine-grained categorization aims to distinguish

between subordinate categories, such as different animal breeds or species, plant species or man-made product models. The problem can be extremely challenging due to the subtle differences in the appearance of certain parts across related categories and often requires distinctions that must be conditioned on the object pose for reliable identification. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variations often present in these domains. Face recognition is the classic case of fine-grained recognition, and it is noteworthy that the best face recognition methods jointly discover facial landmarks and extract features from those locations. We propose pose-normalized representations, which align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in camera viewing angle.

I first present the methods of using the idea of pose-normalization for two related applications: human attribute classification and person recognition beyond frontal face. Following the recent success of deep learning, we use deep convolutional features as feature representations. Next, I will introduce the part-based RCNN method as an extension of state-of-art object detection method RCNN for fine-grained categorization. The model learns both whole-object and part detectors, and enforces learned geometric constraints between them. I will also show the results of using the recent compact bilinear features to generate the pose-normalized representations. However, bottom-up region proposals is limited by handengineered features and in the final work, I will present a fully convolution deep network, trained end-to-end for part localization and fine-grained classification.

Main Content
Current View