Skip to main content
eScholarship
Open Access Publications from the University of California

Representing lexical ambiguity in prototype models of lexical semantics

Abstract

We show, contrary to some recent claims in the literature, thatprototype distributional semantic models (DSMs) are capa-ble of representing multiple senses of ambiguous words, in-cluding infrequent meanings. We propose that word2vec con-tains a natural, model-internal way of operationalizing the dis-ambiguation process by leveraging the two sets of represen-tations word2vec learns, instead of just one as most workon this model does. We evaluate our approach on artifi-cial language simulations where other prototype DSMs havebeen shown to fail. We furthermore assess whether these re-sults scale to the disambiguation of naturalistic corpus exam-ples. We do so by replacing all instances of sampled pairsof words in a corpus with pseudo-homonym tokens, and test-ing whether models, after being trained on one half of the cor-pus, were able to disambiguate pseudo-homonyms on the ba-sis of their linguistic contexts in the second half of the cor-pus. We observe that word2vec well surpasses the baselineof always guessing the most frequent meaning to be the rightone. Moreover, it degrades gracefully: As words are moreunbalanced, the baseline is higher, and it is harder to surpassit; nonetheless, Word2vec succeeds at surpassing the baseline,even for pseudo-homonyms whose most frequent meaning ismuch more frequent than the other.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View