- Main
Unsupervised Analysis of Structured Human Artifacts
- Berg-Kirkpatrick, Taylor
- Advisor(s): Klein, Dan
Abstract
The presence of hidden structure in human data--including natural language but
also sources like music, historical documents, and other complex
artifacts--makes this data extremely difficult to analyze. In this thesis, we
develop unsupervised methods that can better cope with hidden structure across
several domains of human data. We accomplish this by incorporating rich domain
knowledge using two complementary approaches: (1) we develop detailed generative
models that more faithfully describe how data originated and (2) we develop
structured priors that create useful inductive bias.
First, we find that a variety of transcription tasks--for example, both historical
document transcription and polyphonic music transcription--can be viewed as
linguistic decipherment problems. By building a detailed generative model of the
relationship between the input (e.g. an image of a historical document) and its
transcription (the text the document contains), we are able to learn these models in a
completely unsupervised fashion--without ever seeing an example of an input
annotated with its transcription--effectively deciphering the hidden
correspondence. The resulting systems have turned out not only to work well for
both tasks--achieving state-of-the-art-results--but to outperform their
supervised counterparts.
Next, for a range of linguistic analysis tasks--for example, both word alignment and
grammar induction--we find that structured priors based on
linguistically-motivated features can improve upon state-of-the-art generative
models. Further, by coupling model parameters in a phylogeny-structured prior
across multiple languages, we develop an approach to multilingual grammar
induction that substantially outperforms independent learning.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-