Skip to main content
eScholarship
Open Access Publications from the University of California

Comparing predictions of lexical norm data obtained using word associations andword collocation

Abstract

We compared the quality of prediction of word variables based on a Dutch word association and text corpus. Wederived estimates for: valence, arousal, dominance, concreteness and age of acquisition (AoA) for 2831 words. Based on thesimilarity between words we: (1) used projections on a dimension identified as the variable in question in a multidimensionalrepresentation, (2) used the k-nearest neighbors values, weighted according to their proximity. Estimates prevailed when basedon word associations. Differences between the predictions of the two methods were small. Based on the word association corpusit yielded correlations of .92, .85, and .85, for valence, arousal, and dominance, respectively. Its corresponding correlationsbased on the text corpus were .80, .74, and .67. For concreteness and AoA, both the association and the text corpus yieldedcorrelations of .88 and .73, respectively. This suggests word associations are better at capturing human ratings of affective wordvariables.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View