Skip to main content
eScholarship
Open Access Publications from the University of California

Uniform information density explains subject doubling in French

Creative Commons 'BY' version 4.0 license
Abstract

In this paper we investigate whether subject doubling in French is affected by the Uniform Information Density (UID) principle, which states that speakers prefer language encoding that minimizes fluctuations in information density. We show that, other factors being controlled, speakers are more likely to double the NP subject when it has a high surprisal, thus providing further empirical evidence to the UID principle which predicts a surprisal-redundancy trade-off as a property of natural languages. We argue for the importance of employing GPT-2 to investigate complex linguistic phenomena such as subject doubling, as it enables the estimation of subject surprisal by considering a rather large conversational context, a task made possible by powerful language models that incorporate linguistic knowledge through pre-training on extensive datasets.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View