Discovering frequent itemsets by support approximation and itemset clustering

作者:

Highlights:

摘要

To speed up the task of association rule mining, a novel concept based on support approximation has been previously proposed for generating frequent itemsets. However, the mining technique utilized by this concept may incur unstable accuracy due to approximation error. To overcome this drawback, in this paper we combine a new clustering method with support approximation, and propose a mining method, namely CAC, to discover frequent itemsets based on the Principle of Inclusion and Exclusion. The clustering technique groups highly similar members to improve the accuracy of support approximation. The hit ratio analysis and experimental results presented in this paper verify that CAC improves accuracy. Without repeatedly scanning a database and storing vast information in memory, the CAC method is able mine frequent itemsets with relative stability. The advantages that the CAC method enjoys in both accuracy and performance make it an effective and useful technique for discovering frequent itemsets in a database.

论文关键词:Support approximation,Clustering,Data mining,Combinatorial approximation,Frequent itemset

论文评审过程:Received 1 February 2007, Revised 12 September 2007, Accepted 16 October 2007, Available online 1 November 2007.

论文官网地址:https://doi.org/10.1016/j.datak.2007.10.003