Improving a Fundamental Measure of Lexical Association
Skip to main content
eScholarship
Open Access Publications from the University of California

Improving a Fundamental Measure of Lexical Association

Abstract

Pointwise mutual information (PMI), a simple measure of lexical association, is part of several algorithms used as models of lexical semantic memory. Typically, it is used as a component of more complex distributional models rather than in isolation. We show that when two simple techniques are applied—(1) down-weighting co-occurrences involving low- frequency words in order to address PMI’s so-called “frequency bias,” and (2) defining co-occurrences as counts of “events in which instances of word1 and word2 co-occur in a context” rather than “contexts in which word1 and word2 co- occur”—then PMI outperforms default parameterizations of word embedding models in terms of how closely it matches human relatedness judgments. We also identify which down- weighting techniques are most helpful. The results suggest that simple measures may be capable of modeling certain phenomena in semantic memory, and that complex models which incorporate PMI might be improved with these modifications.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View