Skip to main content
eScholarship
Open Access Publications from the University of California

On the Predictive Power of Neural Language Models for Human Real-TimeComprehension Behavior

Abstract

Human reading behavior is tuned to the statistics of natural lan-guage: the time it takes human subjects to read a word can bepredicted from estimates of the word’s probability in context.However, it remains an open question what computational ar-chitecture best characterizes the expectations deployed in realtime by humans that determine the behavioral signatures ofreading. Here we test over two dozen models, independentlymanipulating computational architecture and training datasetsize, on how well their next-word expectations predict humanreading time behavior on naturalistic text corpora. Consistentwith previous work, we find that across model architecturesand training dataset sizes the relationship between word log-probability and reading time is (near-)linear. We next evalu-ate how features of these models determine their psychometricpredictive power, or ability to predict human reading behav-ior. In general, the better a model’s next-word expectations(as measured by the traditional language modeling perplexityobjective), the better its psychometric predictive power. How-ever, we find nontrivial differences in psychometric predictivepower across model architectures. For any given perplexity,deep Transformer models and n-gram models generally showsuperior psychometric predictive power over LSTM or struc-turally supervised neural models, especially for eye movementdata. Finally, we compare models’ psychometric predictivepower to the depth of their syntactic knowledge, as measuredby a battery of syntactic generalization tests developed usingmethods from controlled psycholinguistic experiments. Onceperplexity is controlled for, we find no significant relationshipbetween syntactic knowledge and predictive power. These re-sults suggest that, at least for the present state of natural lan-guage technology, different approaches may be required to bestmodel human real-time language comprehension behavior innaturalistic reading versus behavior for controlled linguisticmaterials designed for targeted probing of syntactic knowl-edge.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View