Learning from local image regions
- Author(s): Dollár, Piotr
- et al.
A trend in computer vision over the last decade or so has been to describe the statistics and content of images in terms of local image regions, i.e., image patches. Applications have included object detection, scene recognition, texture classification and image categorization. Local patch based representations have the advantage that they are robust to global transformations, occlusion, clutter, object and image variation, and so on, while retaining rich information about image content. This is the case even when global information relating the relative position of patches is not used, as in so called "bags of words" approaches. Furthermore, in the supervised learning framework where labeled images are a source of data, characterizing images using patches means a single image can provide a large number of patches for training. These properties suggest local patch based representations should continue to find expanded use in computer vision. In this dissertation we show the application of patch based methods to three domains for which traditionally more global approaches have been used. First we show how the classic problem of edge detection can be posed as a series of patch by patch decisions that can be solved in a supervised learning framework. We show the application of this approach to a number of specific domains such as mouse boundary detection and road detection. Second, we show how modeling object warps and highly non-linear image transformations can again be done locally, thus avoiding computational challenges and the scarcity of data typically associated with these problems. For example, our approach is able to learn eye motion and out-of-plane rotation of a teacup from sparse data. Third, we extend the notion of local regions from 2D to 3D, i.e. from patches to cuboids, in order to model the content of video. We show applications to behavior recognition in a number of domains including human activity and mouse behavior. The methods we introduce here advance the state of the art and have the potential to be useful in a broad range of applications in computer vision. Our approach to edge detection currently outperforms all competing approaches for gray scale edge detection and comes in close second for color edge detection on the well established Berkeley Segmentation Dataset. We hope it will play a similar role as Canny edge detection but for highly textured, real world images. Our approach to modeling object warps locally showed dramatic improvements over previous such methods, and helped solidify the theoretical foundation of nonlinear manifold learning. Finally, our cuboids formalism is simple yet powerful, and has already been utilized in two vision systems. It has the potential to serve as the basis for a broad range of methods for describing the contents of video. Overall, our contribution has been to help establish the importance of patch based approaches and to expand our understanding of a fundamental aspect of computer vision