Efficient mining of high-utility itemsets using multiple minimum utility thresholds

作者:

Highlights:

摘要

In the field of data mining, the topic of high-utility itemset mining (HUIM) has recently gained a lot of attention from researchers as it takes many factors into account that are useful for decision-making by retail managers. In the past, many algorithms have been presented for HUIM but most of them suffer from the limitation of using a single minimum utility threshold to identify high-utility itemsets (HUIs). For real-life applications, finding itemsets using a single threshold is inadequate and unfair since each item is different. Hence, the diversity or importance of each item should be considered. This paper proposes a solution to this issue by defining the novel task of HUIM with multiple minimum utility thresholds (named as HUIM-MMU). This task lets users specify a different minimum utility threshold for each item to identify more useful and specific HUIs, which would generate more profits when compared to HUIs discovered based on a single minimum utility threshold. The HUI-MMU algorithm is designed to mine HUIs in a level-wise manner. The sorted downward closure (SDC) property and the least minimum utility (LMU) concept are developed to avoid a combinatorial explosion for identifying HUIs and to ensure the completeness and correctness of HUI-MMU for discovering HUIs. Meanwhile, two improved algorithms, namely HUI-MMUTID and HUI-MMUTE, are presented based on the TID-index and EUCP strategies. Those strategies can be used to speed up the mining performance to discover HUIs. Substantial experiments on both real-life and synthetic datasets show that the designed algorithms can efficiently and effectively discover the complete set of HUIs in databases by considering multiple minimum utility thresholds.

论文关键词:HUIM,Multiple thresholds,Sorted downward closure,LMU,HUI-MMU

论文评审过程:Received 10 February 2016, Revised 12 September 2016, Accepted 18 September 2016, Available online 23 September 2016, Version of Record 20 October 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.09.013