RFIMiner: A regression-based algorithm for recently frequent patterns in multiple time granularity data streams

作者:

Highlights:

摘要

In this paper, we propose an algorithm for computing and maintaining recently frequent patterns which is more stable and smaller than the data stream and dynamically updating them with the incoming transactions. Our study mainly has two contributions. First, a regression-based data stream model is proposed to differentiate new and old transactions. The novel model reflects transactions into many multiple time granularities and can automatically adjust transactional fading rate by defining a fading factor. The factor defines a desired life-time of the information of transactions in the data stream. Second, we develop RFIMiner, a single-scan algorithm for mining recently frequent patterns from data streams. Our algorithm employs a special property among suffix-trees, so it is unnecessary to traverse suffix-trees when patterns are discovered. To cater to suffix-trees, we also adopt a new method called Depth-first and Bottom–up Inside Itemset Growth to find more recently frequent patterns from known frequent ones. Moreover, it avoids generating redundant computation and candidate patterns as well. We conduct detailed experiments to evaluate the performance of algorithm in several aspects. Results confirm that the new method has an excellent scalability and the performance meets the condition which requires better quality and efficiency of mining recently frequent itemsets in the data stream.

论文关键词:Data mining,Data streams,Multiple time granularities,Recently frequent patterns,Suffix-trees

论文评审过程:Available online 7 September 2006.

论文官网地址:https://doi.org/10.1016/j.amc.2006.06.115