Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Advancing the Cognitive Abilities of Embodied Agents: Large-Scale Simulations and Multi-Agent Collaborations

Abstract

To construct a general artificial intelligence system, embodied agents must be able to perceive their environment, understand human language, engage in complex reasoning, manipulate objects, and collaborate with humans and each other. Cognitive science research suggests that intelligence emerges from sensorimotor experiences and interactions with the physical world. However, learning active perception and sensorimotor control through interaction with the physical environment can be challenging because existing algorithms are too slow for real-time learning, and embodied agents are fragile and expensive. Consequently, there is a pressing need for virtual simulation systems that can mimic complex behaviors and facilitate agent-environment interactions. In addition to mastering basic physical skills, embodied agents also need to engage in long-horizon task planning, coordination, and abstract reasoning to be effective in real-world scenarios.

The first line of research reported in this thesis focuses on developing simulation environments in which robots can interact with human users and their surroundings. We introduce a new simulation environment, \vrkitchen, which enables the simulation of complex high-level behaviors and state changes. We also collect a dataset featuring human-environment interactions to predict human intentions. Furthermore, we develop a new system, \arnold, to simulate intricate low-level physics, including articulated objects and liquids. Using the \arnold Dataset, we assess the abilities of robots to comprehend human language and execute complex manipulations under varied visual conditions, thereby evaluating their generalization capabilities in diverse and novel environments.

The second line of research addresses multi-agent collaboration and task allocation. We examine how robots of various types can cooperate with each other or with human users to accomplish common tasks. Initially, we propose a joint mind modeling framework based on the theory of mind to enhance the collaboration between humans and robots. Subsequently, we create a suite of multi-robot vision-based collaboration tasks, \lemma, where robots positioned around a tabletop must collaborate to complete a task based on high-level instructions and also utilize tools. Lastly, leveraging large language models, we introduce a centralized multi-agent dispatcher framework, \mindagent, and its associated benchmarks and infrastructures.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View