Skip to main content
eScholarship
Open Access Publications from the University of California

Character-based Surprisal as a Model ofReading Difficulty in the Presence of Errors

Abstract

Intuitively, human readers cope easily with errors in text; ty-pos, misspelling, word substitutions, etc. do not unduly disruptnatural reading. Previous work indicates that letter transposi-tions result in increased reading times, but it is unclear if thiseffect generalizes to more natural errors. In this paper, we re-port an eye-tracking study that compares two error types (let-ter transpositions and naturally occurring misspelling) and twoerror rates (10% or 50% of all words contain errors). We findthat human readers show unimpaired comprehension in spiteof these errors, but error words cause more reading difficultythan correct words. Also, transpositions are more difficult thanmisspellings, and a high error rate increases difficulty for allwords, including correct ones. We then present a computa-tional model that uses character-based (rather than traditionalword-based) surprisal to account for these results. The modelexplains that transpositions are harder than misspellings be-cause they contain unexpected letter combinations. It also ex-plains the error rate effect: expectations about upcoming wordsare harder to compute when the context is degraded, leading toincreased surprisal.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View