Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

A Treebank of the Karuk Language

Abstract

In this dissertation, I introduce the Karuk treebank, a collection of syntactically-parsed sentences of the Karuk language. The goals of this dissertation are, first, to describe the construction of the treebank and the rationale for its design and, second, to showcase the utility of the treebank through case studies in two domains: the order of arguments and predicates, and cases of unexpected agreement marking. The study of word order showcases the treebank's aptitude at helping us understand large-scale statistical patterns in Karuk syntax, and the latter study of agreement showcases the use of the treebank in finding rare and previously unstudied phenomena.

Chapter 1 provides the necessary background on the Karuk language and the history of its documentation by outsiders and on the treebank project itself. Chapter 2 presents the annotation guidelines which were both used as a manual to guide annotators in their annotation of the language and now serve as an explanation and description of the use of every element found in Karuk treebank annotations.

Chapter 3 presents the first case studies utilizing the treebank, focusing on the word order of arguments and predicates. Karuk word order had often been claimed to be `free' with every or most orders of subject, direct object, and verb attested, but their relative prevalence and the word orders of clauses with other types of argument (complements and indirect objects) or non-verbal predicates is eludicated for the first time in detail in this chapter. Methodologically, I argue in Chapter 3 for the utility of treebanks in allowing easier study of large-scale, quantitative properties of corpora than comparable, treebank-less methods. In the case studies meant to showcase this, I describe broad, word order patterns of the treebank corpus and elucidate three trends found in this data: that subordinate clauses tend to have less expressed arguments than main clauses; that subjects are less likely to be expressed in transitive clauses; and that the prevalence of pre-verbal S is driven partially by a likelier-than-expected tendency for the presence of both a subject and object to lead to pre-verbal subjects.

Chapter 4 presents the second set of case studies, focusing on two phenomena where observed agreement in the corpus does not match the agreement expected from Bright (1957)'s description of the agreement system: sentences with plural agreement where singular was expected, and vice versa. These two mismatches (and some inconsistency with one particular agreement prefix for 3pl subjects and 3pl objects) turn out to the only systematic cases that differ from Bright's description of the agreement, thus confirming his accuracy overall. Methodologically, in this chapter I argue for the utility of treebanks in locating rare phenomena in corpora that may escape notice by comparable treebank-less methods. In terms of the case studies, I elucidate in Chapter 4 a heretofore undescribed phenomenon whereby the use of plural subject agreement with a singular subject indicates subject demotion. Chapter 5 concludes with some thoughts about the future of the treebank project.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View