Autonomous robots have swiftly revolutionized several industries, enhancing manufacturing processes by streamlining assembly lines for heightened efficiency, revolutionizing agriculture with automated planting and harvesting, and refining logistics through the optimization of warehousing and delivery systems. These advancements underscore the groundbreaking impact of robotics on productivity and innovation across a diverse range of sectors. However, the rapid advancement in autonomous robotics faces a significant challenge: the heavy dependency on massive data and the daunting task of generalizing learned behaviors to new tasks and environments. This challenge represents more than a mere technical obstacle; it is a critical bottleneck that constrains the widespread adoption and effectiveness of autonomous robots, stifling innovation and practical deployment across a myriad of industrial applications.
In this dissertation, we study the problem of data-efficient learning and generalization for industrial autonomous robots. Our goal is to develop algorithms that enable robots to learn from limited data and generalize the learned behaviors to novel tasks and environments effectively. The core idea is to leverage proper task knowledge and assumptions, embed- ding them into the algorithmic designs to significantly enhance their data efficiency. Our research endeavors are dedicated to three principal aspects: data-efficient reward learning, data-efficient policy learning, and data-efficient policy generalization. These approaches are meticulously applied across a wide array of industrial scenarios, such as autonomous vehicles, robotic assembly, and robotic palletization, showcasing their versatility and effectiveness in enhancing robotic efficiency and adaptability in real-world applications.
This dissertation unfolds in three distinct parts, offering a comprehensive examination of data-efficient learning and generalization for autonomous robots. Part I lays the foundation with an in-depth exploration of data-efficient reward learning, employing inverse reinforcement learning (Chapter 2) and representation learning (Chapter 3) to uncover efficient ways to infer reward signals from limited data. Part II shifts focus to the nuances of data-efficient policy learning. Chapter 4 introduces a data-efficient reinforcement learning (RL) policy specifically designed for robotic palletization, enhancing data efficiency by narrowing the exploration space via learned action space masking. In Chapter 5, we propose a novel skill representation method, namely motion primitives (MP), alongside a data-efficient framework for learning MP-based insertion skills directly from human demonstrations. Concluding with Part III, the dissertation advances into the realm of data-efficient policy generalization across diverse tasks. In Chapter 6, a novel zero-shot policy generalization approach is presented, capitalizing on the compositional structure of task representations to enable seamless adaptation to new tasks without the need for additional data.