Efficient high utility itemset mining using buffered utility-lists

作者:Quang-Huy Duong, Philippe Fournier-Viger, Heri Ramampiaro, Kjetil Nørvåg, Thu-Lan Dam

摘要

Discovering high utility itemsets in transaction databases is a key task for studying the behavior of customers. It consists of finding groups of items bought together that yield a high profit. Several algorithms have been proposed to mine high utility itemsets using various approaches and more or less complex data structures. Among existing algorithms, one-phase algorithms employing the utility-list structure have shown to be the most efficient. In recent years, the simplicity of the utility-list structure has led to the development of numerous utility-list based algorithms for various tasks related to utility mining. However, a major limitation of utility-list based algorithms is that creating and maintaining utility-lists are time consuming and can consume a huge amount of memory. The reasons are that numerous utility lists are built and that the utility-list intersection/join operation to construct a utility-list is costly. This paper addresses this issue by proposing an improved utility-list structure called utility-list buffer to reduce the memory consumption and speed up the join operation. This structure is integrated into a novel algorithm named ULB-Miner (Utility-List Buffer for high utility itemset Miner), which introduces several new ideas to more efficiently discover high utility itemsets. ULB-Miner uses the designed utility-list buffer structure to efficiently store and retrieve utility-lists, and reuse memory during the mining process. Moreover, the paper also introduces a linear time method for constructing utility-list segments in a utility-list buffer. An extensive experimental study on various datasets shows that the proposed algorithm relying on the novel utility-list buffer structure is highly efficient in terms of both execution time and memory consumption. The ULB-Miner algorithm is up to 10 times faster than the FHM and HUI-Miner algorithms and consumes up to 6 times less memory. Moreover, it performs well on both dense and sparse datasets.

论文关键词:Pattern mining, Itemset mining, Utility mining, Utility list, Utility list buffer

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-017-1057-2