Skip to main content
eScholarship
Open Access Publications from the University of California

Are the most frequent words the most useful? Investigating core vocabulary in reading

Abstract

High-frequency words are often assumed to be the most useful words for communication, as they provide the greatest coverage of texts. However, the relationship between text coverage and comprehension may not be straightforward -- some words may provide more information than others. In this study, we explore alternative methods of defining core vocabulary in addition to word frequency (e.g., words that are central hubs in semantic association networks). We report on the results of an empirical test of communicative utility using a text-based guessing game. We show that core words that reflect corpus-based distributional statistics (like frequency or co-occurrence centrality) were less useful for communication than others. This was evident both in terms of the size of the vocabulary that must be known and the proportion of the text that must be covered for successful communication.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View