Skip to main content
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Beyond Bounding Boxes: Precise Localization of Objects in Images


Object recognition in computer vision comes in many flavors, two of the most popular being object detection and semantic segmentation. Object detection systems detect every instance of a category in an image, and coarsely localize each with a bounding box. Semantic segmentation systems assign category labels to pixels, thus providing pixel-precise localization but failing to resolve individual instances of the category. We argue for a richer output: recognition systems should detect individual instances of a category and provide pixel precise segmentations for each, a task we call Simultaneous Detection and Segmentation or SDS. We describe approaches to this task that leverage convolutional neural networks for precise localization. We also show that the techniques we develop are also effective for other tasks such as segmenting the parts of a detected object or localizing its keypoints. These are our first steps towards a recognition system that goes beyond category labels and coarse bounding boxes to precise, detailed descriptions of objects in images.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View