Search

Scholarly Works (3 results)

Sort By:

Article
Peer Reviewed

Diachronic Entropy Rate in Language Evolution: A Case Study of 2500 Years of Historical Chinese

Proceedings of the Annual Meeting of the Cognitive Science Society, Volume 43 (2021)

Information theory (Shannon, 1948) plays an important role in psycholinguistic and linguistic theories (Genzel & Charniak, 2002; Hale, 2003; Levy, 2008). Here, we examine how entropy rate, a measure of information content encoded in each individual word, changes diachronically in Chinese. We conduct a computational study on the four main development stages of Chinese, Old Chinese, Middle Chinese, Early Modern Chinese and Modern Chinese. We approximate entropy rate of each century by adopting a diachronic trigram language model with interpolated Kneser-Ney smoothing technique (Chen & Goodman, 1999), which is trained on multiple comprehensive data sets selected according to Chinese philology studies (Wang, 1980; Gao & Jing, 2005) covering over 2,500 years of corpus data. Our modeling results show that entropy rate, on average, increases 0.026 for each century. Within each major stage, historical Chinese demonstrates a steady rise in entropy rate, suggesting a vocabulary increase whereas entropy rate tends to fluctuate more in transitional stages, around the 10th century and the 15th century, lending support to the hypothesis that grammar competition in language contact is one of the driving forces behind major changes in diachronic Chinese. Our study demonstrates the interaction between psycholinguistic pressures and the evolution of linguistic systems.

Cover page: Diachronic Entropy Rate in Language Evolution: A Case Study of 2500 Years of Historical Chinese

Creative Commons 'BY' version 4.0 license

Article
Peer Reviewed

Finding probabilistic context-free grammar in Chinese writing system

Proceedings of the Annual Meeting of the Cognitive Science Society, Volume 42 (2020)

Writing systems play a very important role in human languages, but the mathematical nature of writing systems remainsunderstudied. Here, we conduct a case study of an open-class writing system Chinese characters, which consists of aset of expandable basic units, in contrast to most other writing systems whose basic units form closed sets, or closed-class systems. We demonstrate that probabilistic context-free grammars underlie the representation of Chinese writing, byformalizing Chinese characters as a grammar with character shapes, as nonterminal rules, and components. as terminalnodes. Rule probabilities are estimated from a character treebank of the most frequent 3500 characters. Exploratoryanalysis reveals Zipfian distributions of both shapes and components. Our experiments also demonstrate that Chinesewriting system shows generative powers similar to PCFG, with 78% of the noncharacters generated from our grammarjudged acceptable, which suggests fundamental differences between open-class and closed-class writing systems.

Cover page: Finding probabilistic context-free grammar in Chinese writing system

Article
Peer Reviewed

English Speakers Produce and Understand Expletive Negation

Proceedings of the Annual Meeting of the Cognitive Science Society, Volume 42 (2020)

Romance languages are well known for their use of expletive negation (henceforth, EN), i.e., the occurrence of a negator in the complement clause of certain verbs, adpositions or adverbs that is “illogically” not part of the meaning of the sentence. This study explores the hypothesis that such “illogism” that recurs across languages must be due to universal properties of the message to be encoded and the language production system. Jin & Koenig (2019) proposed a language production model to account for the striking similarity of EN-triggers between two unrelated languages (French and Mandarin). Their model makes several predictions which our paper tests: (i) languages like English where EN is purported not to occur should in fact include the same range of EN-triggers; (ii) English speakers can understand a negator within the scope of an EN-trigger expletively; (iii) the likelihood a speaker of English will understand a negator expletively is correlated with how frequently she has encountered an expletive interpretation of negators for that particular trigger. To test the first prediction, we conducted a corpus study of unrehearsed English speech on Google. To test the second prediction, we conducted a semantic Stroop-like comprehension experiment where participants’ semantic judgements (both logical accuracy and response time) was dependent on whether a negator was interpreted logically or expletively. Overall, this paper suggests that EN is by no means specific to Romance languages and that expletive uses of negators occur in the same contexts in both production and comprehension in languages where EN is not conventionalized to the same degree it is in Romance. Overall, our results support the claim that “illogical” properties of natural languages that recur across languages of the world reflect universal properties of the language production system.

Cover page: English Speakers Produce and Understand Expletive Negation