Davani, Aida Mostafazadeh; Atari, Mohammad; Kennedy, Brendan; Havaldar, Shreya; Dehghani, Morteza

Hatred is in the Eye of the Annotator: Hate Speech Classifiers Learn Human-LikeSocial Stereotypes

2020

Abstract

Social stereotypes impact individuals’ judgement about different social groups. One area where such stereotyping has acritical impact is in hate speech detection, in which human annotations of text are used to train machine learning models.Such models are likely to be biased in the same ways that humans are biased in their judgments of social groups. Inthis research, we investigate the effect of stereotypes of social groups on the performance of expert annotators in a largecorpus of annotated hate speech. We also examine the effect of these stereotypes on unintended bias of hate speechclassifiers. To this end, we show how language-encoded stereotypes, associated with social groups, lead to disagreementsin identifying hate speech. Lastly, we analyze how inconsistencies in annotations propagate to a supervised classifier whenhuman-generated labels are used to train a hate speech detection model.

Main Content

For improved accessibility of PDF content, download the file to your device.

Proceedings of the Annual Meeting of the Cognitive Science Society

Hatred is in the Eye of the Annotator: Hate Speech Classifiers Learn Human-LikeSocial Stereotypes