Predicting Students’ English Performance with Traditional Statistical Modeling and Machine Learning: An Analysis of the China Education Panel Survey (CEPS)
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Predicting Students’ English Performance with Traditional Statistical Modeling and Machine Learning: An Analysis of the China Education Panel Survey (CEPS)

Abstract

With the global expansion of English teaching, factors related to language achievement have recently garnered a significant amount of attention (Onwuegbuzie, et al., 2000; Phillipson & Phillipson, 2007). This research aims to contribute to the literature on English achievement in the Chinese context by examining the influence of specific key variables (e.g., students’ grade level, parent involvement, teacher characteristics, school demographics) on English achievement scores. The data are taken from the China Education Panel Survey (CEPS), a large-scale, nationally representative, longitudinal surveystarting with two cohorts (7th and 9th graders enrolled in the 2013-2014 academic year). In addition to exploring English achievement, the study also contributes to the literature on quantitative methodologies in the context of educational research by exploring the use of statistical modeling and machine learning in studies on academic achievement. Analyses from both multilevel modeling and Support Vector Regression (SVR) revealed that students’ English performance was largely explained by their scores on Chinese language performance, cognitive aptitude scores, self-perceived educational expectations, and parents’ expectations of their children’s academic performance and future educational achievement. The current study corroborates the findings of previous research, which demonstrate that achievement in one’s native language is associated with the achievement of languages learned later in life (Ortega, 2014). Both multilevel modeling and SVR were shown to be useful methods for predicting English achievement, suggesting that educators and researchers may benefit from both approaches to further understand the broader variable of academic achievement.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View