Distribuational Information and the Acquisition of Linguistic Categories: A Statistical Approach
Skip to main content
eScholarship
Open Access Publications from the University of California

Distribuational Information and the Acquisition of Linguistic Categories: A Statistical Approach

Abstract

Distributional information, in the form of simple, lo- cally computed statistics of an input corpus, provi- des a potential m e a n s of establishing initial syntac- tic categories (noun, verb, etc.). Finch and Chater (1991, 1992) clustered words hierarchically, accor- ding to the distribution of locad contexts in which they appeared in large, written English corpora, obtaining clusters that corresponded well with the standard syntactic categories. Here, a stronger de- monstration of their method is provided, using 'real' data, that to which children are exposed during ca- tegory acquisition, taken from the childes corpus. For 2-5 million words of aulult speech, clustering on syntsu:tic and semantic bases was observed, with a high degree of cleai differentiation between syntac- tic categories. For child data, s o m e noun and verb clusters emerged, with s o m e evidence of other ca- tegories, but the data set was too small for reliable trends to emerge. S o m e initial results investigating the possibility of classifying novel words using only the immediate context of a single instance are also presented. These results demonstrate that statisti- cal information m a y play an important role in the processes of early language 2u:quisition.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View