Scaling up psycholinguistics
- Author(s): Smith, Nathaniel J.;
- et al.
This dissertation contains several projects, each addressing different questions with different techniques. In chapter 1, I argue that they are unified thematically by their goal of 'scaling up psycholinguistics'; they are all aimed at analyzing large datasets using tools that reveal patterns to propose and test mechanism-neutral hypotheses about the brain's language processing architecture. In chapter 2, I investigate the well-known phenomenon that words which are more predictable in context are read faster than words which are not. I suggest that this is best understood as a special case of a phenomenon whereby more predictable events, whether linguistic or not, are in general processed more quickly than unpredictable ones, and propose a general model of why this happens. When combined with the constraints imposed by language's incrementally processed serial structure, this model predicts a logarithmic relationship between word probability and reading time, and I show that this in fact holds in two large data sets. In chapter 3, I turn to the question of how the brain produces fine- grained quantitative predictions; computationally, this seems impossible given the sparsity of the data it has available. This suggests it uses some kind of learning biases to guide generalization (a classic poverty of the stimulus argument), and I propose to study these biases by comparing subjective (cloze) probability with objective corpus probabilities. In Experiment 1, I find a number of candidate biases; in a series of follow-up experiments, I argue that some are probably artifactual, but others may in fact give clues to the brain's mechanisms for generating linguistic expectations from experience. Chapters 4 and 5 develop a methodology for extending ERP analysis to handle continuously varying stimulus attributes, partial confounding, and overlapping brain responses to events occurring in quick succession. These techniques are motivated by the desire to analyze EEG recorded in more naturalistic language paradigms (which contain all of the above challenges), but they are not specific to language studies, and have many potential applications for EEG/MEG research more generally