- Main
Building 3D Foundation Models for the Embodied Minds
- Hong, Yining
- Advisor(s): Wu, Yingnian YW;
- Chang, Kai-Wei KC
Abstract
This thesis investigates the development of artificial intelligence systems with embodied understanding of the three-dimensional world. Moving beyond the limitations of current AI models that function as disembodied information processors, I present foundation models for machines that actively perceive, reason about, and interact with physical reality. The research progresses through four interconnected components that collectively bridge the gap between computational pattern recognition and embodied intelligence. ``The Thinking Eye" enables visual systems to reason about physics and causality, developing models that can go beyond sheer pattern recognition. "The World inside `I'" builds internal representations of 3D environments through novel descriptor field frameworks that allow machines to construct and update mental models of space through active exploration.In ``The Thinking Body: Reasoning and Acting in the 3D World," I introduce a set of 3D-based large language models that integrate multiple sensory modalities while actively interacting with 3D environments through perception-action loops. The final part, ``The Embodied Mind," examines how agents develop individual minds through accumulated experiences and how multiple agents with distinct histories can develop collaborative intelligence. Collectively, this research establishes a foundation for AI systems that understand the 3D world through grounded physical experience rather than pattern recognition.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-