Autonomous robots have the potential to play a critical role in various aspects of modern life, including search and rescue, autonomous driving, medical surgery, agricultural farms, etc. Reinforcement learning algorithms allow intelligent agents to discover optimal behavior through trial and error from the interactions with the environment and have been successfully applied to playing video games, mastering the game of Go and training large language models. In robotics, this data driven learning approach is also promising for locomotion, manipulation and navigation.
When demonstrations are available, an agent can learn to perform a task by imitating expert behavior. However, the agent has to generalize to novel scenarios that are not seen in training. This thesis introduces two aspects to learn generalizable policies from demonstrations. The first method infers a cost function from semantic and geometric information from observations and can generalize to unseen, dynamic, partially observable simulated environments for autonomous driving scenarios. The second method infers task logic from demonstrations which are in turn used as constraints for motion planning. It exploits the hierarchical logic structure from demonstrated trajectories and can generalize to sequential, compositional planning problems.
Another challenge towards deploying robots in the natural world is the ability to bridge the simulation to reality gap. While simulation provides training data at low cost, the policy should be able to account for mismatches in sensing and actuation when deployed on real robots. To address these challenges, this dissertation introduces a latent space alignment approach where policies trained on a source robot can be adapted to a target robot of different embodiments. Finally, this dissertation also presents a sim-to-real method for throwing and catching objects with bimanual robots, where they need to cooperate precisely to interact with diverse objects at high speed.