We present a large-scale conceptual replication of an experiment that provided evidence of false consensus biases in legal interpretation: when reading a legal contract, individuals tend to over-estimate the extent to which others would agree with their interpretation of that contract (Solan, Rosenblatt, & Osherson 2008). Our results are consistent with this previous finding. We also observe substantial unexplained item-level variation in the extent to which individuals agree on contract interpretation, as well as unexplained variation in the extent to which the false consensus bias holds across different contexts.
In a first step towards understanding the source(s) of this variability, we show that a state-of-the-art large language model (LLM) with zero-shot prompting does not robustly predict the degree to which interpreters will exhibit consensus in a given context. However, performance improves when the model is exposed to data of the form collected in our experiment, suggesting a path forward for modeling and predicting variability in the interpretation of legally-relevant natural language.