The use of the area under the ROC curve in the evaluation of machine learning algorithms

作者：

Highlights：

•

摘要

In this paper we investigate the use of the area under the receiver operating characteristic (ROC) curve (AUC) as a performance measure for machine learning algorithms. As a case study we evaluate six machine learning algorithms (C4.5, Multiscale Classifier, Perceptron, Multi-layer Perceptron, k-Nearest Neighbours, and a Quadratic Discriminant Function) on six “real world” medical diagnostics data sets. We compare and discuss the use of AUC to the more conventional overall accuracy and find that AUC exhibits a number of desirable properties when compared to overall accuracy: increased sensitivity in Analysis of Variance (ANOVA) tests; a standard error that decreased as both AUC and the number of test samples increased; decision threshold independent; and it is invariant to a priori class probabilities. The paper concludes with the recommendation that AUC be used in preference to overall accuracy for “single number” evaluation of machine learning algorithms.

论文关键词：The ROC curve,The area under the ROC curve (AUC),Accuracy measures,Cross-validation,Wilcoxon statistic,Standard error

论文评审过程：Received 15 April 1996, Revised 29 July 1996, Accepted 10 September 1996, Available online 7 June 2001.

论文官网地址：https://doi.org/10.1016/S0031-3203(96)00142-2