Extreme value correction: a method for correcting optimistic estimations in rule learning

作者：Martin Možina, Janez Demšar, Ivan Bratko, Jure Žabkar

摘要

Machine learning algorithms rely on their ability to evaluate the constructed hypotheses for choosing the optimal hypothesis during learning and assessing the quality of the model afterwards. Since these estimates, in particular the former ones, are based on the training data from which the hypotheses themselves were constructed, they are usually optimistic. The paper shows three different solutions; two for the artificial boundary cases with the smallest and the largest optimism and a general correction procedure called extreme value correction (EVC) based on extreme value distribution. We demonstrate the application of the technique to rule learning, specifically to estimating classification accuracy of a single rule, and evaluate it on an artificial data set and on a number of UCI data sets. We observed that the correction successfully improved the accuracy estimates. We also describe an approach for combining rules into a linear global classifier and show that using EVC estimates leads to more accurate classifiers.

论文关键词：Machine learning, Multiple comparisons, Extreme value distribution, Rule learning

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10994-018-5731-3