As enrollments and class sizes in postsecondary institutions have increased, instructors have sought automated means to identify students who are at risk of failing a course. This identification must be performed early enough in the term to allow instructors to assist those students before they fall irreparably behind. In this sense, this dissertation proposes early identification methods of at-risk students and characterizes behaviors of such students.
The first part of this work describes a modeling methodology that predicts student final exam scores in the third week of the term by using different sets of easy-to-collect data including student clicker responses, prerequisite course grades, online quiz scores, and assignment grades. The work uses different machine learning techniques, trained on one term of a course, to predict outcomes in subsequent terms. This allowed making predictions across terms in a natural setting (different final exams, minor changes to course content). We applied this modeling technique to five different courses across the computer science curriculum, taught by three different instructors at two different institutions. The results show consistent performance across multiple courses in a curriculum, and across multiple institutions. Also, prerequisite course grades and clicker responses are more predictive than online quiz and assignment grades.
The second part of the work is to understand what factors contribute to those students' difficulties. If we were able to better understand the characteristics of such students, we may be better able to help those students. This work examines the characteristics of lower- and higher-performing students through interviews with students from an introductory computing class. We identify a number of relevant areas of student behavior including how they approach their exam studies, how they approach completing programming assignments, whether they sought help after identifying misunderstandings, how and from whom they sought help, and how they reflected on assignments after submitting them. Particular behaviors within each area are coded and differences between groups of students are identified.
This dissertation ends with stating some future work of automating the identification process for general public and crafting effective student intervention frameworks.