- Main
Representing lexical ambiguity in prototype models of lexical semantics
Abstract
We show, contrary to some recent claims in the literature, thatprototype distributional semantic models (DSMs) are capa-ble of representing multiple senses of ambiguous words, in-cluding infrequent meanings. We propose that word2vec con-tains a natural, model-internal way of operationalizing the dis-ambiguation process by leveraging the two sets of represen-tations word2vec learns, instead of just one as most workon this model does. We evaluate our approach on artifi-cial language simulations where other prototype DSMs havebeen shown to fail. We furthermore assess whether these re-sults scale to the disambiguation of naturalistic corpus exam-ples. We do so by replacing all instances of sampled pairsof words in a corpus with pseudo-homonym tokens, and test-ing whether models, after being trained on one half of the cor-pus, were able to disambiguate pseudo-homonyms on the ba-sis of their linguistic contexts in the second half of the cor-pus. We observe that word2vec well surpasses the baselineof always guessing the most frequent meaning to be the rightone. Moreover, it degrades gracefully: As words are moreunbalanced, the baseline is higher, and it is harder to surpassit; nonetheless, Word2vec succeeds at surpassing the baseline,even for pseudo-homonyms whose most frequent meaning ismuch more frequent than the other.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-