Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Human Activity Understanding and Prediction with Stochastic Grammar

Abstract

Video understanding is a booming research problem in computer vision. With its innate feature where spatial and temporal information entangles with each other, video understanding has been challenging mainly because of the difficulty for having a unified framework where these two aspects can be modeled jointly. Among the tasks in video understanding, human activity understanding and prediction serve as a good starting point where the spatial-temporal reasoning capability of learning modules can be tested. Most of the current approaches towards solving the human activity understanding and prediction problems use deep neural networks for spatial-temporal reasoning. However, this type of approach lacks the ability to reason beyond the local frames and conduct long-term temporal reasoning. On the other hand, stochastic grammar models are used to model observed sequences on a symbolic level with all history information considered, but they perform poorly on handling noisy input sequences. Given these insights and problems of current approaches, we propose the generalized Earley parser for bridging the gap between sequence inputs and symbolic grammars. By combining the advantages of these two types of methods, we show that the proposed model achieves a better performance on both human activity recognition and future prediction.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View