Skip to main content
Open Access Publications from the University of California

Scruffy text understanding: design and implementation of tolerant understanders


Most large text-understanding systems have been designed under the assumption that the input text will be in reasonably "neat" form,e.g., newspaper stories and other prepared texts, which typically consist of well-formed sentences and a logical order of presentation of concepts. However, many everyday uses of natural text (e.g., business memos, phone messages, notes) are very ill-informed, containing misspelled words, ungrammatical or only partially complete sentences, and poorly organized ideas.

Many large organizations employ humans whose sole job it is to take ill-formed messages and encode them into machine-readable form to be entered into a database. In such cases, there would be great potential benefit if that encoding process could be partially automated. This requires a "scruffy text understander"; i.e., a system that has the ability to correctly analyze the content of such texts in spite of their scruffiness.

This paper describes the design and implementation of the NOMAD system, which partially automates the encoding of poorly written Navy messages into well-formed formats. A number of problems are described which arise in the scruffy text domain that have not been dealt with in previous text-understanding systems.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View