Skip to main content
eScholarship
Open Access Publications from the University of California

Towards describing Tibetan syntax: From word segmentation to rewrite rules through a semi-automated workflow

Abstract

The first task in Tibetan Natural Language Processing is word segmentation. We present our lightweight segmentation tool that is based on lexical ressources. It can be executed natively in InDesign and the user can update it with the manual corrections of its output. We then propose a semi-automated workflow aiming at syntactic analysis that uses utterance simplification and intonation cues to get pre- cise informations about the syntactical structure. Non-specialised native speakers are thus able to provide us with precise information about the structure of utter- ances. This will allow the scientific community to obtain the ressources needed to initiate the study of Tibetan syntax. In this process, informants will obtain educa- tional material generated from the utterances they will have processed.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View