Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Learning Generalist Robot Manipulation Policies

Abstract

The pursuit of Artificial General Intelligence necessitates intelligent agents with a ``body'' to interact with and learn from their environments, central to the goal of Embodied AI. Despite remarkable success in learning specialized skills for individual tasks through data-driven approaches, learning generalist robot manipulation policies, which master generalizable skills for a wide range of tasks, remains challenging. In this dissertation, we present our efforts to develop scalable simulation systems and explore effective representations that facilitate the learning of generalist robot policies.

One major challenge is the high cost and inefficiency of collecting high-quality, diverse demonstration data in the real world.Simulations, serving as proxies for the real world, are more affordable and accessible, allowing us to scale up demonstration collection and policy evaluation more easily. To this end, we develop ManiSkill2, a simulation benchmark for generalizable manipulation skills. This platform features over 2000 objects and 4 million demonstration frames for 20 out-the-box task families. We also provide a wide range of baselines and host a public leaderboard for the community to evaluate object-level generalization on manipulation skills.

Crucial to making full use of available demonstration data is the development of suitable representations, enabling robots to adapt to a broad spectrum of tasks. We propose RT-Trajectory, which explores enhancing task-level generalization by leveraging existing demonstration datasets with a novel policy conditioning: coarse trajectory sketch. This sketch outlines the desired motion of the robot's end-effector, empowering the policy to adapt to unseen tasks with novel semantics and movements in a promptable way.

Moreover, in Multi-skill Mobile Manipulation (M3), we study a modular approach to tackle long-horizon mobile manipulation tasks, which decomposes a full task into a sequence of subtasks solved by chaining multiple manipulation and navigation skills. We demonstrate how subtask definitions significantly shape skill quality and utility in the context of skill chaining. Accordingly, we redefine stationary manipulation and point-goal navigation skills into more versatile mobile manipulation and region-goal navigation skills.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View