Liang, Yiming; Amsili, Pascal; Burnett, Heather; Demberg, Vera

Uniform information density explains subject doubling in French

2024

Creative Commons 'BY' version 4.0 license

Abstract

In this paper we investigate whether subject doubling in French is affected by the Uniform Information Density (UID) principle, which states that speakers prefer language encoding that minimizes fluctuations in information density. We show that, other factors being controlled, speakers are more likely to double the NP subject when it has a high surprisal, thus providing further empirical evidence to the UID principle which predicts a surprisal-redundancy trade-off as a property of natural languages. We argue for the importance of employing GPT-2 to investigate complex linguistic phenomena such as subject doubling, as it enables the estimation of subject surprisal by considering a rather large conversational context, a task made possible by powerful language models that incorporate linguistic knowledge through pre-training on extensive datasets.

Proceedings of the Annual Meeting of the Cognitive Science Society

Uniform information density explains subject doubling in French