Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

Serialization and Hierarchy: From Data to Corpus in Linguistic Fieldwork

Abstract

The structure of digital documentation should empower linguists to search the entirety of a documentary archive. This should be possible even when multiple tools were used to enter the data into the archive. Current practice tends to lead toward fragmenting of archives along the lines of the tools themselves: data entered in one tool cannot generally be searched outside of that tool, and thus data produced by distinct tools cannot be combined and aggregated. The solution to this problem requires an approach which is more general than critique of existing software. In this thesis I elaborate a general but simple way to digitize linguistic data which enables annotation of arbitrary levels of linguistic or annotative detail. A key benefit of this approach is that it allows for documentation which can be searched across levels of language structure, and across varied sorts of linguistic annotation. It is hoped that this simpler, more general design for archival documentation will contribute to the production of linguistic documentation, upon which in principle many sorts of software could operate, and which would be of use to linguists with many and varied interests.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View