- Main
Automatic Estimation of Lexical Concreteness in 77 Languages
Abstract
We estimate lexical Concreteness for millions of wordsacross 77 languages. Using a simple regression framework,we combine vector-based models of lexical semantics withexperimental norms of Concreteness in English and Dutch.By applying techniques to align vector-based semantics acrossdistinct languages, we compute and release Concreteness esti-mates at scale in numerous languages for which experimentalnorms are not currently available. This paper lays out thetechnique and its efficacy. Although this is a difficult datasetto evaluate immediately, Concreteness estimates computedfrom English correlate with Dutch experimental norms at ρ= .75 in the vocabulary at large, increasing to ρ = .8 amongNouns. Our predictions also recapitulate attested relationshipswith word frequency. The approach we describe can be readilyapplied to numerous lexical measures beyond Concreteness.