Predictive Modelling for Loan Defaults
- Author(s): Zhu, Leon
- Advisor(s): Wu, Ying Nian
- et al.
In this paper we explore how predictive modelling can be applied in loan default prediction. The issue of predicting the outcome of a loan to be fully paid or defaulted is one of binary classification. We explore the use of different machine learning models and their performance, namely, logistic regression, random forest, neural network, extreme gradient boost and ensemble. Additionally, as is the case with many industry data, class imbalance is an issue and data as cannot be used as such in a model otherwise the model will suffer from bias. In order to solve this issue, we explore the use of sampling techniques, such as SMOTE and ADASYN, and cost sensitive learning techniques, such as class weights. Finally, using precision, recall, G-mean, and F-measure as well as precision and recall curve AUC to examine the results of each model, it was found that there is no balancing method that is consistently superior. While all models performed well after applying a balancing method, the XGBoost with class weights model performed the best. With a robust model, there are potential opportunities for it to be leveraged in optimizing profits to produce a greater return on investment. Using the best model, return on investment was able to be improved by 83%.