Imbalanced Binary Classification On Hospital Readmission Data With Missing Values
- Author(s): Zhang, Hui
- Advisor(s): Wu, Yingnian
- et al.
Hospital readmission is a costly, undesirable, and often preventable patient outcome of inpatient care. Early readmission prediction can effectively prevent life-threatening events and reduce healthcare costs. However, imbalanced class distribution and high missing value rates are usually associated with readmission data and need to be handled carefully before building classification models. In this paper, we investigate the prediction of hospital readmission on a dataset with high percentage of missing values and class imbalance problem. Different methods are applied to impute missing values in the categorical variables and numerical variables. In addition, SMOTE (Synthetic Minority Over-sampling Technique) and cost-sensitive learning are combined with different classification methods (LASSO logistic regression, random forest, and gradient boosting) to explore which one will yield the best classification performance on the readmission data. Total misclassification cost and area under ROC curve are used as evaluation metrics for model comparison. Our results show that the SMOTE method causes overfitting on our readmission data and cost-sensitive learning outperforms SMOTE in terms of total misclassification cost.