- Main
Deep Learning in 3D Hand Pose and Mesh Estimation
- Chen, Liangjian
- Advisor(s): Xie, Xiaohui
Abstract
3D Hand pose estimation is an important problem because of its wide range of potential applications, such as sign language translation, robotics, movement disorder detection and monitoring, and human-computer interaction (HCI). However, despite of the previous progress, it remains a challenge problem in the field of computer vision due to the difficulty to acquire high quality hand pose annotation. In this dissertation, we develop various of approaches to address this problem aiming for achieving a better estimation accuracy or provide easier training environment. First, to bridge the image quality gap between the synthetic dataset and real world dataset, we propose TAGAN(Tonality-Aligned Generative Adversarial Networks) to produce more realistic hand poses image. Second, to loose the requirement of paired RGB and Depth image requirement for most state-of-the-art $3$D hand pose estimator, we propose DGGAN(Depth-image Guided Generative Adversarial Networks) to let those hand pose estimator could be trained on RGB image only dataset. Third, since the accurate 3D hand pose estimation is very difficult to acquired, we propose the TASSN(Temporal-Aware Self-Supervised Network) with temporal consistency constraints which learns 3D hand poses and meshes from videos with only 2D keypoint position annotations. Last but not the least, since 3D hand pose from single image is intrinsically ill-posed. We want to build a multi-view hand mesh benchmark to tackle this problem from multi-view perspective. we design a spin match algorithm that enables a rigid mesh model matching with any target mesh ground truth. Based on the match algorithm, we propose an efficient pipeline to generate a large-scale multi-view hand mesh (MVHM) dataset with accurate 3D hand mesh and joint labels.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-