Approaches for credit scorecard calibration: An empirical analysis

作者:

Highlights:

摘要

Financial institutions use credit scorecards for risk management. A scorecard is a data-driven model for predicting default probabilities. Scorecard assessment concentrates on how well a scorecard discriminates good and bad risk. Whether predicted and observed default probabilities agree (i.e., calibration) is an equally important yet often overlooked dimension of scorecard performance. Surprisingly, no attempt has been made to systematically explore different calibration methods and their implications in credit scoring. The goal of the paper is to integrate previous work on probability calibration, to re-introduce available calibration techniques to the credit scoring community, and to empirically examine the extent to which they improve scorecards. More specifically, using real-world credit scoring data, we first develop scorecards using different classifiers, next apply calibration methods to the classifier predictions, and then measure the degree to which they improve calibration. To evaluate performance, we measure the accuracy of predictions in terms of the Brier Score before and after calibration, and employ repeated measures analysis of variance to test for significant differences between group means. Furthermore, we check calibration using reliability plots and decompose the Brier Score to clarify the origin of performance differences across calibrators. The observed results suggest that post-processing scorecard predictions using a calibrator is beneficial. Calibrators improve scorecard calibration while the discriminatory ability remains unaffected. Generalized additive models are particularly suitable for calibrating classifier predictions.

论文关键词:Credit scoring,Classification,Calibration,Probability of default

论文评审过程:Received 22 February 2017, Revised 23 July 2017, Accepted 25 July 2017, Available online 26 July 2017, Version of Record 13 September 2017.

论文官网地址:https://doi.org/10.1016/j.knosys.2017.07.034