Nematzadeh, Aida; Meylan, Stephan C.; Griffiths, Thomas L.

Evaluating Vector-Space Models of Word Representation, or,The Unreasonable Effectiveness of Counting Words Near Other Words

2017

Abstract

Vector-space models of semantics represent words ascontinuously-valued vectors and measure similarity based onthe distance or angle between those vectors. Such representa-tions have become increasingly popular due to the recent de-velopment of methods that allow them to be efficiently esti-mated from very large amounts of data. However, the ideaof relating similarity to distance in a spatial representationhas been criticized by cognitive scientists, as human similar-ity judgments have many properties that are inconsistent withthe geometric constraints that a distance metric must obey. Weshow that two popular vector-space models, Word2Vec andGloVe, are unable to capture certain critical aspects of humanword association data as a consequence of these constraints.However, a probabilistic topic model estimated from a rela-tively small curated corpus qualitatively reproduces the asym-metric patterns seen in the human data. We also demonstratethat a simple co-occurrence frequency performs similarly toreduced-dimensionality vector-space models on medium-sizecorpora, at least for relatively frequent words.

Main Content

For improved accessibility of PDF content, download the file to your device.

Proceedings of the Annual Meeting of the Cognitive Science Society

Evaluating Vector-Space Models of Word Representation, or,The Unreasonable Effectiveness of Counting Words Near Other Words