Skip to main content
eScholarship
Open Access Publications from the University of California

Simplicity Bias in Human-generated data

Creative Commons 'BY' version 4.0 license
Abstract

Texts available on the Web have been generated by human minds. We observe that simple patterns are over-represented: abcdef is more frequent than arfbxg and 1000 appears more often than 1282. We suggest that word frequency patterns can be predicted by cognitive models based on complexity minimization. Conversely, the observation of word frequencies offers an opportunity to infer particular cognitive mechanisms involved in their generation.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View