A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition

作者:

Highlights:

• Class-imbalanced data is a common problem to many prediction problems.

• Classification techniques can yield deceivingly high prediction accuracy with imbalanced dataset.

• All balancing techniques improved the prediction accuracy for the minority class.

• SVM combined with SMOTE data-balancing technique achieved the best overall accuracy.

• A sensitivity analysis revealed the most important variables for attrition prediction.

摘要

•Class-imbalanced data is a common problem to many prediction problems.•Classification techniques can yield deceivingly high prediction accuracy with imbalanced dataset.•All balancing techniques improved the prediction accuracy for the minority class.•SVM combined with SMOTE data-balancing technique achieved the best overall accuracy.•A sensitivity analysis revealed the most important variables for attrition prediction.

论文关键词:Student retention,Attrition,Prediction,Imbalanced class distribution,SMOTE,Sampling,Sensitivity analysis

论文评审过程:Available online 31 July 2013.

论文官网地址:https://doi.org/10.1016/j.eswa.2013.07.046