Towards describing Tibetan syntax: From word segmentation to rewrite rules through a semi-automated workflow
- Author(s): Hildt, Hélios
- et al.
Published Web Locationhttps://doi.org/10.5070/H915129932
The first task in Tibetan Natural Language Processing is word segmentation. We present our lightweight segmentation tool that is based on lexical ressources. It can be executed natively in InDesign and the user can update it with the manual corrections of its output. We then propose a semi-automated workflow aiming at syntactic analysis that uses utterance simplification and intonation cues to get pre- cise informations about the syntactical structure. Non-specialised native speakers are thus able to provide us with precise information about the structure of utter- ances. This will allow the scientific community to obtain the ressources needed to initiate the study of Tibetan syntax. In this process, informants will obtain educa- tional material generated from the utterances they will have processed.