Skip to main content
eScholarship
Open Access Publications from the University of California

Using Natural Language Processing Models to Evaluate STEM Book Coherence

Creative Commons 'BY' version 4.0 license
Abstract

Learning in the STEM disciplines depends on high-quality STEM books, but choosing a textbook can be difficult inthe absence of objective measures of text quality. Here we compared two natural language processing approaches forevaluating text cohesion. In Coh-Metrix (Graesser et al. 2004), text cohesion is indicated by the mean cosine value of theall possible pairs of sentence vectors, with sentence vectors based on LSA. We introduce a new method for measuring textcoherence based on the deep learning language model RoBERTa (Liu et al., 2019). In this new approach, coherence ismeasured by determining the average predictability of all of the words in the text, with word predictability a function ofeach words linguistic context. Coherence as measured by RoBERTa more closely matched the coherence ratings of humanjudges than did Coh-Metrix. Implications for the assessment and categorization of STEM books are discussed.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View