A novel ensemble feature selection method by integrating multiple ranking information combined with an SVM ensemble model for enterprise credit risk prediction in the supply chain

作者:

Highlights:

摘要

Enterprise credit risk prediction in the supply chain context is an important step for decision making and early credit crisis warnings. Improving the prediction performance of this task is an academic and industrial focus. Feature selection and class imbalance can affect prediction performance: redundant and irrelevant features increase the learning difficulty of the prediction model, cause overfitting and reduce prediction performance, whereas class imbalance, with many fewer minority class instances than majority class instances, may cause model failure. Herein, a sequence backward feature selection algorithm based on ranking information (SBFS-RI) and a novel ensemble feature selection method integrating multiple ranking information (FS-MRI) are proposed. The FS-MRI method can realize the automatic threshold function while considering the model performance and then output the best and a more stable feature subset. In addition, an SVM ensemble model with an artificial imbalance rate (SVME-AIR) is proposed to solve the class imbalance problem and realize the effective combination of under-sampling technology and the AdaBoost ensemble method for the first time. Finally, FS-MRI and SVME-AIR are combined through a two-stage model design. The hybrid model can effectively solve the feature selection and class imbalance problems for enterprise credit risk prediction in the supply chain context. Supply chain data of Chinese listed enterprises shows that the FS-MRI method outperforms nine other feature selection methods and provides more robust and efficient feature subsets. The SVME-AIR model has higher AUC and KS values than other ensemble models and single classifiers. When combined, the two methods achieve the best prediction performance, with maximum AUC and KS values of 0.8772 and 0.6363, respectively.

论文关键词:Enterprise credit risk prediction,Ensemble feature selection,Automatic threshold,Class imbalance,Artificial imbalance rate,Supply chain

论文评审过程:Received 4 November 2021, Revised 4 January 2022, Accepted 26 March 2022, Available online 31 March 2022, Version of Record 2 April 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.117002