Machine Learning with Keyword Analysis for Supporting Holistic Undergraduate Admissions in Computer Science
College admissions processes have traditionally relied on academic characteristics like GPA and standardized testing, as well as supplementary application materials. In California, the introduction of Proposition 209 in 1996 prohibited the consideration of gender and ethnicity for admissions decisions. In an attempt to increase diversity, many universities adopted holistic review to fairly evaluate and consider applicants' abilities inside and outside the classroom. However, this increases subjective assessment which could have implications for human error and bias. As such, Machine Learning should be explored as a means of assistance while also reducing potential bias.
Minimal data regarding Machine Learning applications in undergraduate holistic review has been evaluated. In this thesis, we discuss performances of supervised classifiers that could provide verification of the scores that application reviewers assign. We utilize a dataset of applicants to the Computer Science department at the University of California, Irvine to train our models. Collected data includes demographics, academic history, high school information, and essay responses. The best-performing classifier was Logistic Regression trained on a dataset that included all numerical and categorical variables along with extracted keyword bigrams from the text. This classifier obtained the highest accuracy of 0.789. With feature coefficient analysis, we observed the effects of academic achievement, extracurricular involvement and writing content on the model's predictions.