Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Advancing Mathematical Reasoning with Language Models: A Multimodal and Knowledge-Intensive Perspective

Abstract

Mathematical reasoning is a pivotal component of human intelligence, crucial for advancing education and science. This dissertation delves into the development of language model systems capable of robust mathematical reasoning, marking a significant step toward realizing general artificial intelligence. We introduce multi-modal and knowledge-intensive benchmarks to assess the reasoning capabilities of large language models (LLMs) and vision-language models (VLMs) across real-world contexts, including visual information, tabular data, and scientific domains.

This dissertation advances the field by proposing new pre-trained VLMs. For instance, Patch-Trm introduces a patch-based cross-modal Transformer model for abstract diagram reasoning. We also present innovative retrieval and tool-augmented algorithms that enhance LLM capabilities. Notably, Inter-GPS is a neuro-symbolic solver for geometry that demonstrates human-level performance, marking a first in the domain. Additionally, PromptPG pioneers the use of reinforcement learning for dynamic in-context example selection, significantly improving the stability and accuracy of LLMs. Another groundbreaking contribution is Chameleon, a model that integrates LLMs with external tools, vastly increasing their flexibility and effectiveness in real-world applications. The dissertation concludes by analyzing the latest advances in mathematical reasoning within visual contexts, and highlighting the current challenges and future prospects.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View