Efficiently mining high utility itemsets with negative unit profits

作者:

Highlights:

摘要

A High Utility Itemset (HUI) mining is an important problem in the data mining literature that considers utilities of items (such as profits and margins) to discover interesting patterns from transactional databases. Several data structures, pruning strategies and algorithms have been proposed in the literature to efficiently mine high utility itemsets. Most of these works, however, do not consider itemsets with negative unit profits that provide greater flexibility to a decision maker to determine profitable itemsets. This paper aims to advance the state-of-the-art and presents a generalized high utility mining (GHUM) method that considers both positive and negative unit profits. The proposed method uses a simplified utility-list data structure for storing itemset information during the mining process. The paper also introduces a novel utility based anti-monotonic property to improve the performance of HUI mining. Furthermore, GHUM adapts key pruning strategies from the basic HUI mining literature and presents new pruning strategies to significantly improve the performance of mining. The proposed method is evaluated on a set of benchmark sparse and dense datasets and compared against a state-of-the-art method. Rigorous experimental evaluation is performed and implications of the key findings are also presented. In general, GHUM was found to deliver more than an order of magnitude improvement at a fraction of the memory over the state-of-the-art FHN method.

论文关键词:High utility itemset,Anti-monotonic property,Utility list data structure,Negative unit profits,Pruning strategies,Frequent itemset mining

论文评审过程:Received 9 March 2017, Revised 22 December 2017, Accepted 28 December 2017, Available online 29 December 2017, Version of Record 20 February 2018.

论文官网地址:https://doi.org/10.1016/j.knosys.2017.12.035