Quantifying the utility of the past in mining large databases

作者:

Highlights:

摘要

Incremental mining algorithms that can efficiently derive the current mining output by utilizing previous mining results are attractive to business organizations since data mining is typically a resource-intensive recurring activity. In this paper, we present the DELTA algorithm for the robust and efficient incremental mining of association rules on large market basket databases. DELTA guarantees efficiency by ensuring that, for any dataset, at most three passes over the increment and one pass over the previous database are required to generate the desired rules. Further, it handles “multi-support” environments where the support requirements for the current mining differ from those used in the previous mining, a feature in tune with the exploratory nature of the mining process. We present a performance evaluation of DELTA on large databases over a range of increment sizes and data distributions, as well as change in support requirements. The experimental results show that DELTA can provide significant improvements in execution times over previously proposed incremental algorithms in all these environments. In fact, for many workloads, its performance is close to that achieved by an optimal, but practically infeasible, algorithm.

论文关键词:Data Mining,Association Rule,Hierarchical Association Rule,Incremental Mining

论文评审过程:Received 6 February 2000, Revised 8 April 2000, Available online 25 August 2000.

论文官网地址:https://doi.org/10.1016/S0306-4379(00)00021-1