Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Mitigating Gender and L1 Biases in Automated English Speaking Assessment

Abstract

Automated assessment using Natural Language Processing (NLP) has the potential to make English speaking assessments more reliable, authentic, and accessible. Yet without careful examination, NLP may exacerbate social prejudices based on gender or native language (L1). Current NLP-based assessments are prone to such biases, yet research and documentation are scarce. Considering the high stakes nature of English speaking assessment, it is imperative that tests are fair for all examinees, regardless of gender or L1 background. Through a series of three studies, this project addresses the need for more thorough investigations of bias in English speaking assessment. Study 1 examines biases in automated transcription, a key component of automated speaking assessment. Study 2 focuses on a specific type of bias known as differential item functioning (DIF), and determines which patterns of DIF are present in human rater scores and whether or not these patterns of DIF are exacerbated by a pretrained, large language model (LLM) known as BERT. Lastly, Study 3 presents a comparison of two approaches of mitigating DIF using LLMs. Results from Study 1 indicate that there are indeed biases in automated transcription, however these do not translate into biased speaking scores. In Study 2, it is shown that BERT does exacerbate human rater biases, although the effect size is small. Finally, Study 3 demonstrates that it is possible to debias human and automated scores; however, the two approaches have limitations, particularly when the source of DIF is unknown.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View