Deep-Learning-based Video Analysis for Human Action Evaluation
Skip to main content
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Deep-Learning-based Video Analysis for Human Action Evaluation

No data is associated with this publication.

As video analysis provides an automatic solution to extract meaningful information from the video content, it can be applied in healthcare to evaluate human action patterns for various purposes, such as biometrics estimation and performance assessment. In recent years, the fast development of deep learning and portable medical sensors has led to more affordable and accurate computer vision-based measurements for human action patterns, thus enabling a more efficient video analysis system for action evaluation in home and clinic environments. We investigate the novel usage of video analysis for healthcare monitoring purposes, including objective biometrics estimation and subjective action quality assessment. We propose a deep learning framework to extract spatial-temporal features and estimate biometrics or performance scores from 3D body landmarks using a graph convolutional neural network, which offers a portable solution to obtain gold-standard biometrics with 3D multi-joint coordination underlying body movements and can provide real-time feedback of movement performance for rehabilitation exercises. For biometrics estimation, in Chapter 2, we propose two single-task models for video-level and frame-level estimation, respectively, and a multi-task learning approach to estimate CoP metrics on two different temporal levels in parallel. To facilitate this line of research, we collect and release a novel computer-vision-based 3D body landmark dataset using pose estimation. We extend our framework to a traditional kinematics dataset collected by on-body reflective markers by using adaptive graph convolution. For action quality assessment, we propose a deep learning framework for automatic assessment of physical rehabilitation exercises using a graph convolutional network with self-supervised regularization in Chapter 3. To further improve the accessibility of the real-time CoP metrics estimation system, we investigate a view-invariant video-level CoP metrics estimation framework using a single RGB camera in Chapter 4, which could significantly benefit the data collection in home and clinic environments. We also explore a semi-supervised learning framework for video-level CoP metrics estimation for partially labeled data with only a small portion of labels in Chapter 5. Our proposed methods potentially enable a more affordable, comprehensive, and portable virtual therapy system than is available with existing tools.

Main Content

This item is under embargo until January 5, 2025.