Skip to main content
eScholarship
Open Access Publications from the University of California

Availability, informatively and burstiness: Why average corpus measures are an inaccurate guide to surprisal in language

Abstract

It has been proposed that Chinese classifiers facilitate efficient communication by reducing the noun uncertainty in context. Although recent evidence has undermined this proposal, it was obtained using the common method of equating noun occurrence probabilities with corpus frequencies. This method implicity assumes words occur uniformly across contexts, yet this is inconsistent with empirical findings showing word distributions to be bursty. We hypothesized that if language users are sensitive to burstiness, and if classifiers provide information about upcoming nouns, this information will be less important in reducing uncertainty about noun after their first mention. We show that classifier usage provides more information at earlier mentions of nouns and and less information at later mentions, and that the actual classifier distribution appears inconsistent with previous proposals. These results support the idea that classifiers facilitate efficient communication and indicate that language users representations of lexical probabilities in context are dynamic.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View