Skip to main content
eScholarship
Open Access Publications from the University of California

Generalized Representation of Syntactic Structures

Abstract

Analysis of language provides important insights into the un-derlying psychological properties of individuals and groups.While the majority of language analysis work in psychologyhas focused on semantics, psychological information is en-coded not just in what people say, but how they say it. Inthe current work, we propose Conversation Level Syntax Simi-larity Metric-Group Representations (CASSIM-GR). This toolbuilds generalized representations of syntactic structures ofdocuments, thus allowing researchers to distinguish betweenpeople and groups based on syntactic differences. CASSIM-GR builds off of Conversation Level Syntax Similarity Metricby applying spectral clustering to syntactic similarity matricesand calculating the center of each cluster of documents. Thisresulting cluster centroid then represents the syntactical struc-ture of the group of documents. To examine the effectivenessof CASSIM-GR, we conduct three experiments across threeunique corpora. In each experiment, we calculate the cluster-ing accuracy and compare our proposed technique to a bag-of-words approach. Our results provide evidence for the ef-fectiveness of CASSIM-GR and demonstrate that combiningsyntactic similarity and tf-idf semantic information improvesthe total accuracy of group classification.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View