Skip to main content
eScholarship
Open Access Publications from the University of California

Hatred is in the Eye of the Annotator: Hate Speech Classifiers Learn Human-LikeSocial Stereotypes

Abstract

Social stereotypes impact individuals’ judgement about different social groups. One area where such stereotyping has acritical impact is in hate speech detection, in which human annotations of text are used to train machine learning models.Such models are likely to be biased in the same ways that humans are biased in their judgments of social groups. Inthis research, we investigate the effect of stereotypes of social groups on the performance of expert annotators in a largecorpus of annotated hate speech. We also examine the effect of these stereotypes on unintended bias of hate speechclassifiers. To this end, we show how language-encoded stereotypes, associated with social groups, lead to disagreementsin identifying hate speech. Lastly, we analyze how inconsistencies in annotations propagate to a supervised classifier whenhuman-generated labels are used to train a hate speech detection model.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View