Word Probability Re-Estimation Using Topic Modeling and Lexical Decision Data
Two assumptions of psycholinguistic research are that text corpora can be used as a proxy of the language that people have been exposed to and that the reaction time with which people recognize words decreases with the probability (or frequency) of the words in a corpus. We propose a method that produces topic-specific word probabilities from a text corpus using latent Dirichlet allocation, then combines them to fit lexical decision reaction times and re-estimates word probabilities. We evaluated how well independent lexical decision reaction times could be predicted from re-estimated word probabilities compared to original probabilities, using independent lexical decision data. In an experiment designed to prove the concept, the re-estimated word frequency model explained up to 9.6% of additional variability in reaction times on group level and up to 2.9% on level of individual participants.