Modeling Eye Tracking Data with Application to Object Detection
This research focuses on enhancing computer vision algorithms using eye tracking and visual saliency. Recent advances in eye tracking device technology have enabled large scale collection of eye tracking data, without affecting viewer experience. As eye tracking data is biased towards high level image and video semantics, it provides a valuable prior for object detection in images and object extraction in videos. We specifically explore the following problems in the thesis: 1) eye tracking and saliency enhanced object detection, 2) eye tracking assisted object extraction in videos, and 3) role of object co-occurrence and camera focus in visual attention modeling.
Since human attention is biased towards faces and text, in the first work we propose an approach to isolate face and text regions in images by analyzing eye tracking data from multiple subjects. Eye tracking data is clustered and region labels are predicted using a Markov random field model. In the second work, we study object extraction in videos using eye tracking prior. We propose an algorithm to extract dominant visual tracks in eye tracking data from multiple subjects by solving a linear assignment problem. Visual tracks localize object search and we propose a novel mixed graph association framework, inferred by binary integer linear programming. In the final work, we address the problem of predicting where people look in images. We specifically explore the importance of scene context in the form of object co-occurrence and camera focus. The proposed model extracts low-, mid- and high-level and scene context features and uses a regression framework to predict visual attention map. In all the above cases, extensive experimental results show that the proposed methods outperform current state-of-the-art.