Skip to main content
eScholarship
Open Access Publications from the University of California

Corpus-based topic modeling for the cognitive study of the 21st centurysociocultural challenges

Abstract

The results were obtained in the course of a two-stage study. At the first stage (2018) linguists analyzed the conceptualdomain sociocultural challenges on the basis of purposely elaborated Russian language THREAT-corpus (10.4 m words)and built a frame of the domain. At the second stage (2018-2019) the research was carried out with methods of automatedtopic modeling for two Russian language corpora: THREAT-corpus and alternative corpus collected using WebBootCaTtool in the SketchEngine corpus management system. Methods of topic modeling (PLSA, LDA, BigARTM et al.) allowedeliciting thematic profiles for texts of both corpora. Comparison of two datasets was carried out by applying set theory,graph theory, and probabilistic analysis. Combining topic modeling with linguistic frame analysis resulted in more pre-cise configurations of cognitive models in the conceptual domain sociocultural challenges. Word frequency for lexemesmanifesting sociocultural challenges proved to be an important factor of conceptual structures representation.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View