Spurious correlations were found to be an important factor explaining model performance in various NLP tasks (e.g., gender
or racial artifacts), often considered to be “shortcuts” to the
actual task.
However, humans tend to similarly make quick
(and sometimes wrong) predictions based on societal and cognitive presuppositions. In this work we address the question:
can we quantify the extent to which model biases reflect human behaviour?
Answering this question will help shed light
on model performance and provide meaningful comparisons
against humans. We approach this question through the lens
of the dual-process theory for human decision-making. This
theory differentiates between an automatic unconscious (and
sometimes biased) “fast system” and a “slow system”, which
when triggered may revisit earlier automatic reactions.
We
make several observations from two crowdsourcing experiments of gender bias in coreference resolution, using self-
paced reading to study the “fast” system, and question answering to study the “slow” system under a constrained time
setting.
On real-world data humans make ∼3% more gender-
biased decisions compared to models, while on synthetic data
models are ∼12% more biased.
We make all our of our code
and data publicly available.