Skip to main content
eScholarship
Open Access Publications from the University of California

Evaluating Vector-Space Models of Word Representation, or,The Unreasonable Effectiveness of Counting Words Near Other Words

Abstract

Vector-space models of semantics represent words ascontinuously-valued vectors and measure similarity based onthe distance or angle between those vectors. Such representa-tions have become increasingly popular due to the recent de-velopment of methods that allow them to be efficiently esti-mated from very large amounts of data. However, the ideaof relating similarity to distance in a spatial representationhas been criticized by cognitive scientists, as human similar-ity judgments have many properties that are inconsistent withthe geometric constraints that a distance metric must obey. Weshow that two popular vector-space models, Word2Vec andGloVe, are unable to capture certain critical aspects of humanword association data as a consequence of these constraints.However, a probabilistic topic model estimated from a rela-tively small curated corpus qualitatively reproduces the asym-metric patterns seen in the human data. We also demonstratethat a simple co-occurrence frequency performs similarly toreduced-dimensionality vector-space models on medium-sizecorpora, at least for relatively frequent words.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View