Visual information is now an increasingly important part of the data ecosystem. While photos are captured and shared widely over the internet, methods for image search and organization remain unintuitive. The current state of the art in image organization extracts people, objects, location and time information from images and provides options for the automatic grouping of photos based on one of these attributes. As a next step in image organization, we believe that images need to be grouped based on the events they represent. However, such an event-based grouping of images is presently very primitive and imprecise.
In this thesis, we present an event-based image organization approach, where the key-idea is to leverage visual concepts and spatio-temporal metadata of images in order to automatically infer their representative event; this approach combines clustering with probabilistic learning methods. Clustering is performed on images based on spatio-temporal metadata, where each cluster represents an event that occurred at a particular spatio-temporal point/region. We build probabilistic models that learn the associations between the different features and the predefined event labels of each cluster, and use the learned models to automatically infer the events in incoming photo streams. We evaluate efficiency of the proposed method by using a personal image data set, using metrics such as precision and recall.
The contributions of this thesis are two-fold. First, we use several web based sources for semantically augmenting the spatio-temporal metadata corresponding to the images, and second, we combine clustering with probabilistic learning to identify and annotate events in the photo stream, using the augmented metadata and visual image concepts.