Attention Models for Activity Detection
- Author(s): Ulutan, Oytun
- Advisor(s): Manjunath, Bangalore S
- et al.
Video action detection is an important part of video understanding and analysis. There are many possible applications such as smart home environments to recognizing user actions and acting, smart robotics such as autonomous cars, robot assistants, selfie drones following your gesture commands, automated security systems analyzing the environment and assessing events. This thesis focuses on introducing novel machine learning algorithms for video action detection. A central contribution of this research is in developing a context-aware attention model for atomic actions. An atomic action is a simple action which can be described with 1-3 words or atomic body movements such as walking, drinking, holding an object. While observing actions/activities, humans infer from the entire context and our perception depends on the surrounding objects, actors, and scene. Inspired by this, our Actor Conditioned Attention Maps(ACAM) model utilizes the surrounding scene for each actor and uses context for improving action/interaction detection. The modularity of the ACAM model allows us to detect, track and recognize actions over extended time periods. We further extend this framework to detect complex activities which are composed of sequences of atomic actions. We demonstrate the effectiveness of our proposed methods on aerial videos and videos from camera networks.