Rotten Banks: Predicting Bank Failures After Great Recession through Binary Classification
I investigate the determinants of bank failures after the financial crisis of the years 2007 - 2009 to build a predictive model of bank failures.
I use two paradigms for prediction: accuracy-maximization and Neyman-Pearson paradigm. Accuracy-maximization implies that Type I errors and Type II errors are equally costly, thus out-of-sample predictive accuracy is the most important parameter for evaluation. Neyman-Pearson paradigm implies setting an upper bound for Type I errors and minimizing Type II errors within that bound. In this case, the costs associated with Type I and Type II errors can be different.
I find that, because the bank failures are rare events, many of the accuracy-maximizing classifiers tend to assign all the observations to the class of non-failing banks. This achieves out-of-sample predictive accuracy of 96 percent but misses all the failures. Two algorithms, post-Lasso logit, and random forest tend to have a relatively low level of Type II errors.
The classification with the Neyman-Pearson paradigm performs better in terms of minimizing Type II errors while containing Type I errors. All of the algorithms, in out-of-sample testing, were able to identify at least 50 percent of the failing banks, while having a false positive rate below ten percent. The minimum share of Type II errors was displayed by Ada-Boost algorithm (24 percent), while GLM with LASSO penalty and sparse LDA did not perform much worse (the level of Type II errors were 27 percent).
My analysis produces additional substantive insights. I find that low profitability and a high proportion of impaired loans are the most important factors for bank failures.