Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Building 3D Foundation Models for the Embodied Minds

Abstract

This thesis investigates the development of artificial intelligence systems with embodied understanding of the three-dimensional world. Moving beyond the limitations of current AI models that function as disembodied information processors, I present foundation models for machines that actively perceive, reason about, and interact with physical reality. The research progresses through four interconnected components that collectively bridge the gap between computational pattern recognition and embodied intelligence. ``The Thinking Eye" enables visual systems to reason about physics and causality, developing models that can go beyond sheer pattern recognition. "The World inside `I'" builds internal representations of 3D environments through novel descriptor field frameworks that allow machines to construct and update mental models of space through active exploration.In ``The Thinking Body: Reasoning and Acting in the 3D World," I introduce a set of 3D-based large language models that integrate multiple sensory modalities while actively interacting with 3D environments through perception-action loops. The final part, ``The Embodied Mind," examines how agents develop individual minds through accumulated experiences and how multiple agents with distinct histories can develop collaborative intelligence. Collectively, this research establishes a foundation for AI systems that understand the 3D world through grounded physical experience rather than pattern recognition.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View