Skip to main content
eScholarship
Open Access Publications from the University of California

Incorporating Semantic Constraints into Algorithms for Unsupervised Learningof Morphology

Abstract

A key challenge in language acquisition is learning morphological transforms relating word roots to derived forms. Unsu-pervised learning algorithms can perform morphological segmentation by finding patterns in word strings (e.g. Goldsmith,2001), but struggle to distinguish valid segmentations from spurious ones because they look only at sequences of characters(or phonemes) and ignore meaning. For example, a system that correctly discovers ¡add -s¿ as a valid suffix from seeingdog, dogs, cat, cats, etc, might incorrectly infer that ¡add -et¿ is also a valid suffix from seeing bull, bullet, mall, mallet, etc.We propose that learners could avoid these errors with a simple semantic assumption: morphological transforms shouldapproximately preserve meaning. We extend an algorithm from Chan (2008) by integrating proximity in vector-spaceword embeddings as a criterion for valid transforms. On the Brown CHILDES corpus, we achieve both higher accuracyand broader coverage than the purely syntactic approach.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View