Assigning categorical labels to objects in images has proven to be a significantchallenge for automated systems. As cameras rapidly proliferate our society, however, we will necessarily depend more heavily on computers to help us label andsort our images. This work addresses the problem of trying to assign categorical labels to images. We contend that to do this task effectively, we should consider also which part of the image contains the object.
We examine the sensitivity of feature detection to nuisances and propose a new feature detector based on a tree of segmentations. When a detector is notrequired, we describe a fast adaptation that extracts a popular descriptor (SIFT) on a dense grid on the image. Next, we show that a dictionary constructed for the task of categorization can be both smaller and more accurate than one constructed to represent the data alone. We explore splitting descriptors along segmentation boundaries, and show that knowing which part of an image contains the object can make a large difference in accuracy. With these pieces, we construct a fast and accurate pixel-level categorization technique. Then, we move from pixels to small homogeneous collections of pixels (superpixels) and exploit theneighborhood structure of these to form precise superpixel-level categorization.
Finally, the appendix discusses open software we have developed and releasedincluding a GPU implementation of a segmentation algorithm (quick shift) anda MATLAB experiment framework (Blocks) which implements the techniques described in the thesis.