Mining frequent closed inter-sequence patterns efficiently using dynamic bit vectors

作者:Bac Le, Minh-Thai Tran, Bay Vo

摘要

Mining frequent sequences is a critical stage before rule generation for sequence databases. Currently, there are two main ways for mining frequent sequences, namely intra-sequence mining and inter-sequence mining. Inter-sequence mining is more attractive than intra-sequence mining because it considers the relationship between sequences in transactions. However, mining all possible frequent inter-sequences takes a long time and requires a lot of memory. Mining frequent closed inter-sequences is efficient because such sequences are compact, and only the necessary information is maintained. CISP-Miner was proposed for mining frequent closed inter-sequence patterns, but it consumes a lot of memory since many closed patterns are mined. This paper proposes an algorithm called ClosedISP for mining frequent closed inter-sequence patterns. The proposed algorithm uses a checking scheme for early eliminating and checking closed patterns without candidate maintenance. ClosedISP uses a dynamic bit vector that combines transaction information to compress data. In addition, ClosedISP adopts a prefix tree and a depth-first search order to reduce the search space and generate non-redundant sequential rules efficiently. Experiments were conducted to compare the proposed algorithm with CISP-Miner to demonstrate the effectiveness of the proposed algorithm in terms of runtime and memory usage.

论文关键词:Dynamic bit vector, Closed inter-sequence pattern, Vertical data format

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-014-0630-1