Skip to main content
eScholarship
Open Access Publications from the University of California

UC Davis

UC Davis Electronic Theses and Dissertations bannerUC Davis

Using Shrinkage Methods for Model Selection and Improved Predictions: An Application to Time to Degree for Transfer Students

Abstract

This work exploits machine learning (ML) techniques in a linear realm to select the best set of explanatory variables from a potentially large set.We dedicated particular attention to LASSO to explore how this technique improves a model's prediction accuracy. We also used an extension, Islasso, a method that allows hypothesis testing with parameters estimated with a penalized function. We explored the techniques using readily-available datasets and a novel dataset composed of 4,091 observations of UC Davis transfer students with 126 variables. We aimed at understanding UC Davis transfer students' performance, measured by time to degree. To highlight predictive differences, we divided the variables into two subsets: academic and personal. We concluded that academic variables are far more important for predicting students' time to degree. LASSO conducted on the academic subset resulted in the fewer misclassification errors and the lowest AIC, improving from an unpenalized model. Moreover, Islasso also showed that academic variables are the most important in predicting late graduation by transfer students. Lasso helped us understand which variables belonged in the model and reinforced many initial presumptions on which variables should have entered. Moreover, we showed that Islasso could be an excellent compromise to close the gap between inference and selection as it allows us to perform variable selection and to obtain reliable confidence intervals for a model's coefficients simultaneously.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View