Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

Advancing AI Understanding in Language & Vision

Abstract

Large Language Models (LLMs) have emerged as a powerful tool, demonstrating impressive capabilities in natural language generation. These pre-trained models consistently outperform benchmarks across a wide range of multi-modal tasks. However, this raises a crucial question: Do LLMs truly understand and reason about the information they process, or are they simply advanced pattern recognizers? This thesis investigates the reasoning and understanding capabilities of language models, aiming to develop more context-aware and intelligent AI systems. Firstly, we introduce WikiWhy, a benchmark designed to evaluate the reasoning capabilities of LLMs in answering and explaining cause-and-effect questions. Next, we present OCTO+, a state-of-the-art suite for automatic object placement in augmented reality, which leverages open-vocabulary Vision Language Models (VLMs) to integrate virtual content seamlessly. Finally, we propose the Visual Needle in a Haystack framework, which assesses the performance of VLMs in long-context reasoning and highlights their challenges with distractor images. By addressing the limitations in long-context reasoning and promoting interpretability, this thesis seeks to unlock the full potential of LLMs and VLMs, enabling them to truly understand and reason about the world.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View