A method of credit evaluation modeling based on block-wise missing data

作者:Qiujun Lan, Shan Jiang

摘要

Missing data is a common problem in credit evaluation practice and can obstruct the development and application of an evaluation model. Block-wise missing data is a particularly troublesome issue. Based on multi-task feature selection approach, this paper proposes a method called MMPFS to build a model for credit evaluation that primarily includes two steps: (1) dividing the dataset into several nonoverlapping subsets based on missing patterns, and (2) integrating the multi-task feature selection approach using logistic regression to perform joint feature learning on all subsets. The proposed method has the following advantages: (1) missing data do not need to be managed in advance, (2) available data can be fully used for model learning, (3) information loss or bias caused by general missing data processing methods can be avoided, and (4) overfitting risk caused by redundant features can be reduced. The implementation framework and algorithm principle of the proposed method are described, and three credit datasets from UCI are investigated to compare the proposed method with other commonly used missing data treatments. The results show that MMPFS can produce a better credit evaluation model than data preprocessing methods, such as sample deletion and data imputation.

论文关键词:Missing data, Credit evaluation, Data mining, Multi-task learning

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02225-5