Serving CS Formative Feedback on Assessments Using Simple and Practical Teacher-Bootstrapped Error Models
- Author(s): Stephens-Martinez, Kristin;
- Advisor(s): Fox, Armando;
- et al.
The demand for computing education in post-secondary education is growing. However, teaching staff hiring is not keeping pace, leading to increasing class sizes. As computers are becoming ubiquitous, classes are following suit by increasing their use of technology. These two defining factors of scaled classes require us to reconsider teaching practices that originated in small classes with little technology. Rather than seeing scaled classes as a problem that needs management, we propose it is an opportunity that lets us collect and analyze large, high dimensional data sets and enables us to conduct experiments at scale.
One way classes are increasing their use of technology is moving content delivery and assessment administration online. Massive Open Online Courses (MOOCs) have taken this to an extreme by delivering all material online, having no face-to-face interaction, and allowing the class to include thousands of students at once. To understand how this changes the information needs of the teacher, we surveyed MOOC teachers and compared our results to prior work that ran similar surveys among teachers of smaller online courses. While our results were similar, we did find that the MOOC teachers surveyed valued qualitative data – such as forum activity and student surveys – more than quantitative data such as grades. The potential reason for these results is that teachers found quantitative data insufficient to monitor class dynamics, such as problems with course material and student thought processes. They needed a source of data that required less upfront knowledge of what the teacher wanted to look for and how to find it. With such data, their understanding of the students and class situation could be more holistic.
Since qualitative data such as forum activity and surveys have an inherent selection bias, we focused on required, constructed-response assessments in the course. This reduced selection bias had the advantages of needing less upfront knowledge and focused attention on measuring how well students are learning the material. Also, since MOOCs have a high proportion of auditors, we moved to studying a large local class to have a complete sample.
We applied qualitative and quantitative methods to analyze wrong answers from constructed- response, code-tracing question sets delivered through an automated grading system. Using emergent coding, we defined tags to represent ways that a student might arrive at a wrong answer and applied them to our data set. Since what we identified as frequent wrong answers occurred at a much higher rate than infrequent wrong answers, we found that analyzing only these frequent wrong answers provides a representative overview of the data. In addition, a content expert is more likely to be able to tag a frequent wrong answer than a random wrong answer.
Using the wrong answer to tag(s) association, we built a student error model and designed a hint intervention within the automated grading system. We deployed an in situ experiment in a large introductory computer science course to understand the effectiveness of parameters in the model and compared two different kinds of hints: reteaching and knowledge integration . A reteaching hint re-explained the concept(s) associated with the tag. A knowledge integration hint focused on pushing the student in the right direction without re-explaining anything, such as reminding them of a concept or asking them to compare two aspects of the assessment. We found it was straightforward to implement and deploy our intervention experiment because of the existing class technology. In addition, for our model, we found co-occurrence provides useful information to propagate tags to wrong answers that we did not inspect. However, we were unable to find evidence that our hints improved student performance on post-test questions compared to no hints at all. Therefore, we performed a preliminary, exploratory analysis to understand potential reasons why our results are null and to inform future work.
We believe scaled classes are a prime opportunity to study learning. This work is an example of how to take advantage of this chance by first collecting and analyzing data from a scaled class and then deploying a scaled in situ intervention by using the scaled class’s technology. With this work, we encourage other researchers to take advantage of scaled classes and hope it can serve as a starting point for how to do so.