Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Automatic Grading and Feedback for Students’ Short Written Responses

Abstract

This work focuses on automatic grading and revision feedback for open-ended (OE) student responses relating to mathematics and science topics. The dissertation will be divided into three closely related chapters, where each build upon the work of the previous chapter(s). In the first chapter, we focus on improving the performance of an Automatic Short Answer Grading (ASAG) model, where performance is measured by how closely the machine ratings match those of a human rater. We experiment with incorporating domain-related text into the model with sources such as the item text, context, and rubric, and investigate how this might affect the generalizability of the model by testing the ASAG model on out-of-training-sample questions. Further, we examine if representing not only the text, but also the ordinal structure of a scoring rubric is useful by using a graph sampling method prior to incorporating the text into the ASAG model.

In the second chapter, we address the lack of explainability in typical ASAG models. We train a deep reinforcement learning (RL) agent to revise a student’s response by inserting key phrases, or deleting portions of the response, to achieve a high grade from an ASAG model in the least number of revisions. The intent is that the agent’s revisions may provide some explanation as to what was missing or unnecessarily included in a student’s response. We also examine the intelligibility of a Neural Additive Model (NAM) using the RL agent’s key phrases as input features. NAMs provide the explainability of additive models with the ability to approximate high dimensional functions and inputs like a neural network. Further, we use the RL agent to expose shortcomings in the ASAG model by finding revisions that achieve a high grade from the automatic grader but may not be considered good responses by a human rater.

In the third chapter, we focus on whether the RL agent’s revisions can provide feedback to encourage a student to revise their own short response. We collected data with a randomized controlled design from students interacting with an online tutoring system. Students who receive an intervention from the RL agent’s revisions will be compared with both students who receive an intervention from a popular open-source chatbot, ChatGPT, and students who received a generic intervention. We examine student perceptions of the feedback, how prior knowledge interacts with the interventions, and the accuracy of ChatGPT’s feedback.

Main Content
For improved accessibility of PDF content, download the file to your device.