Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Previously Published Works bannerUCLA

How to Apply Variable Selection Machine Learning Algorithms With Multiply Imputed Data: A Missing Discussion

Abstract

Psychological researchers often use standard linear regression to identify relevant predictors of an outcome of interest, but challenges emerge with incomplete data and growing numbers of candidate predictors. Regularization methods like the LASSO can reduce the risk of overfitting, increase model interpretability, and improve prediction in future samples; however, handling missing data when using regularization-based variable selection methods is complicated. Using listwise deletion or an ad hoc imputation strategy to deal with missing data when using regularization methods can lead to loss of precision, substantial bias, and a reduction in predictive ability. In this tutorial, we describe three approaches for fitting a LASSO when using multiple imputation to handle missing data and illustrate how to implement these approaches in practice with an applied example. We discuss implications of each approach and describe additional research that would help solidify recommendations for best practices. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View