Designing agents that autonomously acquire skills to complete tasks in their environments has been an ongoing research topic for decades. The complete realization of the vision remains elusive, yet research pursued in the quest toward this goal has yielded tremendous scientific and technological advances. The thesis addresses three research areas that are key to progress on this vision.
The first area is deep Reinforcement Learning (RL), where we develop new algorithms for both online and offline RL. More specifically, we propose an experimental setting where we demonstrate that pre-training policies from offline datasets can lead to significant improvement in online learning sample efficiency on unseen tasks (up to $80\%$ on standard benchmarks). The second contribution in this area is a novel offline RL algorithm based on Generator Adversarial Network. In contrast to recent algorithms that enforce distribution constraints, we use a dual generator formulation to enforce support constraints, leading to improved performance.
The method outperforms recent state-of-the-art algorithms on tasks that require stitching sub-optimal trajectories to learn performant behavior.
The second is human-machine interfaces for human supervision, e.g. to collect demonstrations for robotic manipulation. Using a single RGB-D camera as the sensing device to capture human motion in real-time, we demonstrate that our teleoperation system allows the human operator to successfully control a 6 degree-of-freedom manipulator to complete complex tasks, such as peg-in-hole and folding cloth.
The third is the automatic construction of simulated environments for training deep neural networks. We show the benefit of our framework in the task of grasping objects in clutter using 6 degree-of-freedom grasp. Using only 30 reconstructed scenes and thousands of grasp labels, a state-of-the-art grasping network architecture when trained using our reconstructions outperforms by 11% the publicly released pre-trained model that was trained with 17.7 million grasp labels.