Beyond dynamic textures : a family of stochastic dynamical models for video with applications to computer vision
- Author(s): Chan, Antoni Bert;
- et al.
One family of visual processes that has relevance for various applications of computer vision is that of, what could be loosely described as, visual processes composed of ensembles of particles subject to stochastic motion. The particles can be microscopic (e.g plumes of smoke), macroscopic (e.g. leaves blowing in the wind), or even objects (e.g. a human crowd or a traffic jam). The applications range from remote monitoring for the prevention of natural disasters (e.g. forest fires), to background subtraction in challenging environments (e.g. outdoor scenes with moving trees in the background), and to surveillance (e.g. traffic monitoring). Despite their practical significance, the visual processes in this family still pose tremendous challenges for computer vision. In particular, the stochastic nature of the motion fields tends to be highly challenging for traditional motion representations such as optical flow, parametric motion models, and object tracking. Recent efforts have advanced towards modeling video motion probabilistically, by viewing video sequences as "dynamic textures'' or, more precisely, samples from a generative, stochastic, texture model defined over space and time. Despite its successes in applications such as video synthesis, motion segmentation, and video classification, the dynamic texture model has several major limitations, such as an inability to account for visual processes consisting of multiple co-occurring textures (e.g. smoke rising from a fire), and an inability to model complex motion (e.g. panning camera motion). We propose a family of dynamical models for video that address the limitations of the dynamic texture, and apply these new models to challenging computer vision problems. In particular, we introduce two multi-modal models for video, the mixture of dynamic textures and the layered dynamic texture, which provide principled frameworks for video clustering and motion segmentation. We also propose a non-linear model, the kernel dynamic texture, which can capture complex patterns of motion through a non-linear manifold embedding. We present a new framework for the classification of dynamic textures, which combines the modeling power of the dynamic texture and the generalization guarantees, for classification, of the support vector machine classifier, by deriving a new probabilistic kernel based on the Kullback-Leibler divergence between dynamic textures. Finally, we demonstrate the applicability of these models to a wide variety of real-world computer vision problems, including motion segmentation, video clustering, video texture classification, highway traffic monitoring, crowd counting, and adaptive background subtraction. We also demonstrate that the dynamic texture is a suitable representation for musical signals, by applying the proposed models to the computer audition task of song segmentation. These successes validate the dynamic texture framework as a principled approach for representing video, and suggest that the models could be useful in other domains, such as computer audition, that require the analysis of time-series data