Qin, Yuzhe

Learning Generalizable Dexterous Manipulation

2024

Abstract

Dexterous manipulation using multi-fingered robotic hands is a crucial area in robotics, aimed at performing intricate tasks with various objects in everyday environments. However, this field presents significant challenges. Modeling the complex contact patterns between a dexterous hand and manipulated objects is difficult, hindering the effectiveness of model-based control methods. Furthermore, the high number of Degrees of Freedom (DoF) in the hand's joints, dramatically increases the complexity of training data-driven policies for dexterous manipulation.

This dissertation addresses the challenging task of learning highly generalizable dexterous manipulation skills applicable across diverse scenarios. We investigate two principal directions to enhance the learning capabilities of dexterous manipulation.

First, we leverage the inherent structural similarities between human and robotic hands, employing human data to guide robot manipulation skills. This approach is motivated by the bio-inspired design of dexterous hands, which offers a unique opportunity to learn from human demonstrations. To facilitate efficient data collection, we develop AnyTeleop, a general vision-based teleoperation system for dexterous robot arm-hand systems. AnyTeleop utilizes readily available devices like web cameras to provide a versatile interface for teleoperating various arm-hand systems. Furthermore, we introduce CyberDemo, a data augmentation technique that expands the original human demonstrations, generating a dataset hundreds of times larger than the initial set. This approach allows for training policies capable of handling a wider range of scenarios without requiring additional human effort.

Second, we explore the potential of using vast amounts of simulated data to learn dexterous manipulation policies. The primary challenge in this direction lies in bridging the domain gap between simulation and the real world, encompassing both dynamics and visual discrepancies. This sim2real gap is particularly pronounced for high DoF dexterous hands. To address this, we propose a sim-to-real reinforcement learning framework, DexPoint, that leverages point cloud and proprioceptive data. This framework integrates multi-modal sensory information into a unified 3D space, preserving the spatial relationships between robot components, sensors, and manipulated objects. This unified representation enables faster policy learning in simulation and smoother transfer to real-world applications.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC San Diego

Learning Generalizable Dexterous Manipulation