RIB: A Robust Itemset-based Bayesian approach to classification

作者:

Highlights:

摘要

Real-life data is often affected by noise. To cope with this issue, classification techniques robust to noisy data are needed. Bayesian approaches are known to be fairly robust to noise. However, to compute probability estimates state-of-the-art Bayesian approaches adopt a lazy pattern-based strategy, which shows some limitations when coping data affected by a notable amount of noise.This paper proposes RIB (Robust Itemset-based Bayesian classifier), a novel eager and pattern-based Bayesian classifier which discovers frequent itemsets from training data and exploits them to build accurate probability estimates. Enforcing a minimum frequency of occurrence on the considered itemsets reduces the sensitivity of the probability estimates to noise. Furthermore, learning a Bayesian Network that also considers high-order dependences among data usually neglected by traditional Bayesian approaches appears to be more robust to noise and data overfitting than selecting a small subset of patterns tailored to each test instance.The experiments demonstrate that RIB is, on average, more accurate than most state-of-the-art classifiers, Bayesian and not, on benchmark datasets in which different kinds and levels of noise are injected. Furthermore, its performance on the same datasets prior to noise injection is competitive with that of state-of-the-art classifiers.

论文关键词:Data mining,Frequent itemset mining,Classification,Bayesian modeling,Noisy data

论文评审过程:Received 12 December 2013, Revised 12 August 2014, Accepted 13 August 2014, Available online 23 August 2014.

论文官网地址:https://doi.org/10.1016/j.knosys.2014.08.015