Skip to main content
eScholarship
Open Access Publications from the University of California

Do large language models resolve semantic ambiguities in the same way as humans? The case of word segmentation in Chinese sentence reading

Creative Commons 'BY' version 4.0 license
Abstract

Large language models (LLMs) were trained to predict words without having explicit semantic word representations as humans do. Here we compared LLMs and humans in re-solving semantic ambiguities at the word/token level by ex-amining the case of segmenting overlapping ambiguous strings in Chinese sentence reading, where three characters “ABC” could be segmented in either “AB/C” or “A/BC” depending on the context. We showed that although LLMs performed worse than humans, they demonstrated a similar interaction effect between segmentation structure and word frequency order, suggesting that this effect observed in humans could be accounted for by statistical learning of word/token occurrence regularities without assuming an explicit semantic word representation. Nevertheless, across stimuli LLMs' responses were not correlated with any hu-man performance or eye movement measures, suggesting differences in the underlying processing mechanisms. Thus, it is essential to understand these differences through XAI methods to facilitate LLM adoption.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View