Efficient algorithms for incremental maintenance of closed sequential patterns in large databases

作者:

Highlights:

摘要

Recent study shows that mining compact frequent patterns (such as closed patterns and compressed patterns) can alleviate the interpretability and efficiency problem encountered by traditional frequent pattern mining methods. Compact frequent patterns keep exact or approximate supports of a complete set of frequent patterns, and the number of them is often orders of magnitude smaller. Several efficient algorithms have been proposed to mine compact sequential patterns. However, sequence databases are not always static. Sequences (or items) are often added to and deleted from databases. A slight change made on a database may lead to the change of compact patterns. Mining from scratch is very time-consuming and thus infeasible. In this paper, we explore how to efficiently maintain closed sequential patterns in a dynamic sequence database environment. A compact structure CSTree is designed to keep closed sequential patterns, and its nice properties are carefully studied. Two efficient algorithms, IMCSA and IMCSD, are developed to maintain the CSTree upon incremental update. The algorithms make full use of the properties of CSTree to find nodes whose states are obsolete and avoid unnecessary node extension and closure checking operations to accelerate the incremental update process. A thorough experimental study on various real and synthetic datasets shows that the proposed algorithms outperform the state-of-the-art algorithms – PrefixSpan, CloSpan, BIDE and a recently proposed incremental mining algorithm IncSpan by about a factor of 4 to more than an order of magnitude.

论文关键词:Data mining,Frequent patterns,Closed sequential patterns,Incremental maintenance

论文评审过程:Received 29 September 2007, Revised 9 July 2008, Accepted 15 August 2008, Available online 28 August 2008.

论文官网地址:https://doi.org/10.1016/j.datak.2008.08.003