- Main
Human Activity Understanding and Prediction with Stochastic Grammar
- Jia, Baoxiong
- Advisor(s): Zhu, Song-chun
Abstract
Video understanding is a booming research problem in computer vision. With its innate feature where spatial and temporal information entangles with each other, video understanding has been challenging mainly because of the difficulty for having a unified framework where these two aspects can be modeled jointly. Among the tasks in video understanding, human activity understanding and prediction serve as a good starting point where the spatial-temporal reasoning capability of learning modules can be tested. Most of the current approaches towards solving the human activity understanding and prediction problems use deep neural networks for spatial-temporal reasoning. However, this type of approach lacks the ability to reason beyond the local frames and conduct long-term temporal reasoning. On the other hand, stochastic grammar models are used to model observed sequences on a symbolic level with all history information considered, but they perform poorly on handling noisy input sequences. Given these insights and problems of current approaches, we propose the generalized Earley parser for bridging the gap between sequence inputs and symbolic grammars. By combining the advantages of these two types of methods, we show that the proposed model achieves a better performance on both human activity recognition and future prediction.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-