Skip to main content
eScholarship
Open Access Publications from the University of California

Large-scale Network Analyses Reveal Cross-Language Differences in Semantic Structures: A Comparative Study

Creative Commons 'BY' version 4.0 license
Abstract

English and Mandarin Chinese are two distinct languages in many aspects, such as orthography and morphology. Previous network analyses show strong clustering coefficients (C) on English semantic networks, revealing the interconnectedness of semantic representations between words. However, it is not clear whether such semantic representation properties are language specific or general, and whether the linguistic- feature difference (e.g., subword components such as orthography and morphology) may affect the lexico-semantic structure. Here, we compared Cs of words in English and Mandarin semantic networks based on a) feature norms empirically derived from human subjects and b) distributed semantic information of text retrieved by word embedding models. We consistently observed higher Cs of Mandarin words than English words, especially when the semantic network considers subword features. Linear regressions suggested that the subword components’ semantic properties in Mandarin, but not in English, could significantly and positively predict the C of words in semantic networks. The results indicate an important role of language-specific properties in lexico-semantic structures and imply the diversity of human language processing.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View