Fast mining erasable itemsets using NC_sets

作者:

Highlights:

摘要

Mining erasable itemsets first introduced in 2009 is one of new emerging data mining tasks. In this paper, we present a new data representation called NC_set, which keeps track of the complete information used for mining erasable itemsets. Based on NC_set, we propose a new algorithm called MERIT for mining erasable itemsets efficiently. The efficiency of MERIT is achieved with three techniques as follows. First, the NC_set is a compact structure, which prunes irrelevant data automatically. Second, the computation of the gain of an itemset is transformed into the combination of NC_sets, which can be completed in linear time complexity by an ingenious strategy. Third, MERIT can directly find erasable itemsets without generating candidate itemsets in some cases. For evaluating MERIT, we have conducted extensive experiments on a lot of synthetic product databases. Our performance study shows that the MERIT is efficient and is on average about two orders of magnitude faster than the META, the first algorithm for mining erasable itemsets.

论文关键词:Data mining,Erasable itemsets,NC_sets,Algorithms,Data structure

论文评审过程:Available online 8 October 2011.

论文官网地址:https://doi.org/10.1016/j.eswa.2011.09.143