A Bayesian approach for comparing cross-validated algorithms on multiple data sets

作者：Giorgio Corani, Alessio Benavoli

摘要

We present a Bayesian approach for making statistical inference about the accuracy (or any other score) of two competing algorithms which have been assessed via cross-validation on multiple data sets. The approach is constituted by two pieces. The first is a novel correlated Bayesian \(t\) test for the analysis of the cross-validation results on a single data set which accounts for the correlation due to the overlapping training sets. The second piece merges the posterior probabilities computed by the Bayesian correlated \(t\) test on the different data sets to make inference on multiple data sets. It does so by adopting a Poisson-binomial model. The inferences on multiple data sets account for the different uncertainty of the cross-validation results on the different data sets. It is the first test able to achieve this goal. It is generally more powerful than the signed-rank test if ten runs of cross-validation are performed, as it is anyway generally recommended.

论文关键词：Bayesian hypothesis tests, Signed-rank test, Cross-validation, Poisson-binomial, Hypothesis test, Evaluation of classifiers

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10994-015-5486-z