- Main
Evaluating Vector-Space Models of Word Representation, or,The Unreasonable Effectiveness of Counting Words Near Other Words
Abstract
Vector-space models of semantics represent words ascontinuously-valued vectors and measure similarity based onthe distance or angle between those vectors. Such representa-tions have become increasingly popular due to the recent de-velopment of methods that allow them to be efficiently esti-mated from very large amounts of data. However, the ideaof relating similarity to distance in a spatial representationhas been criticized by cognitive scientists, as human similar-ity judgments have many properties that are inconsistent withthe geometric constraints that a distance metric must obey. Weshow that two popular vector-space models, Word2Vec andGloVe, are unable to capture certain critical aspects of humanword association data as a consequence of these constraints.However, a probabilistic topic model estimated from a rela-tively small curated corpus qualitatively reproduces the asym-metric patterns seen in the human data. We also demonstratethat a simple co-occurrence frequency performs similarly toreduced-dimensionality vector-space models on medium-sizecorpora, at least for relatively frequent words.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-