Building a Simulation Platform for Embodied AI
- Xiang, Fanbo
- Advisor(s): Su, Hao
Abstract
Embodied AI, the study of intelligent agents that interact with their surroundings to solve challenging tasks, is a crucial step toward building robotic agents that match or surpass human capabilities. While simulators offer scalable, reproducible, and safe methods for studying embodied AI, existing simulation frameworks often fall short in terms of features, extensibility, and the availability of simulation assets. In this dissertation, I present our efforts to advance simulation frameworks that improve performance and reduce sim-to-real domain gaps, as well as our methods for collecting 3D assets through annotation and 3D capture.
One of the most significant challenges in developing simulation environments for embodied AI and manipulation tasks is the scarcity of high-quality simulation assets. To this end, we create a large-scale articulated object dataset, PartNet-Mobility, and develop an accompanying simulator, SAPIEN, to support the simulation and rendering for articulated body manipulation environments. PartNet-Mobility and SAPIEN have enabled research on various problems, including part motion prediction and generalizable articulation manipulation.
The lack of content in embodied AI simulations has motivated our exploration of 3D capture techniques to obtain assets by capturing real-world objects. We have developed the first neural capture technique capable of disentangling 2D appearance maps from 3D geometry, bringing neural capture closer to producing content directly usable in embodied AI simulators. In addition, our method can be extended to capture spatially-varying material parameters under controlled lighting conditions, an essential feature for accurately rendering captured objects.
Finally, we provide significant enhancements to the SAPIEN simulator, transforming it into a comprehensive embodied AI simulation platform. This platform now includes a many-world rendering architecture that achieves state-of-the-art performance, heterogeneous parallel simulation to support the study of generalizable manipulation with fast GPU simulation, physics-grounded simulation for depth sensors with small sim-to-real domain gaps, and various usability features such as graphical interfaces and virtual reality support.