Towards Sparse Modeling of Multi-Object Interactions in Video
In this dissertation, we develop intelligent methodologies for the modeling and recognition of activities in continuous videos. Videos usually consist of activities involving interactions between multiple actors. Recognition of such activities requires modeling the spatio-temporal relationships between the actors and their individual variabilities. We propose the generalized framework of ``String-of-Feature-Graph" that implicitly quantifies the spatial and temporal relationships between interacting objects through modeling spatio-temporal relationships between local motion features. Furthermore, activities related in space and time rarely occur independently and can serve as the context for each other. Thus, rather than modeling only feature-level context, we also implicitly or explicitly model the contextual relationships between activities. Specifically, we utilize probabilistic graphical models, in a max-margin framework, to jointly model and recognize related activities in space and time using motion and various context features within and between actions and activities. We call these models as context-aware graphical models.
When such models are discriminatively trained, redundant features that are highly correlated with each other are usually used. Sparse features are likely to be preferred in such situations because: 1) when model features are sparse, it would be more efficient and effective to estimate the parameters; 2) intrinsic and contextual attributes as well as association rules of inter-dependent objects are usually sparse. Thus, we develop a sparse modeling framework, building upon the proposed context-aware graphical models and group $l1$-regularization, to enhance the efficiency and accuracy for activity recognition. The proposed framework is general enough to work for the recognition of any type of inter-dependent visual objects, such as visual activities and image objects.
In real activity recognition applications, not all types of activities are known to us or exist in the training examples. Approaches that aim to detect the abnormal activities which are different from the known or training activities in certain aspect are in need. As an extension of the proposed context-aware models for activity recognition, we further work on the detection of anomalous activities. We define three types of anomalous activities with abnormal motion and/or context behaviors. With the learned context-aware graphical model from normal activities, we utilize statistical inference methods for the detection of anomalous activities whose motion and context patterns deviate from the learned patterns. Our studies advance computer vision and pattern recognition through demonstrated benefits of using the proposed approaches over the state-of-the-art works.