FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits

作者:

Highlights:

摘要

High utility itemset mining is an emerging data mining task, which consists of discovering highly profitable itemsets (called high utility itemsets) in very large transactional databases. Many algorithms have been proposed to efficiently discover high utility itemsets but most of them assume that items may only have positive unit profits. However, in real-world transactional databases, items (products) often have positive or negative unit profits. Mining high utility itemsets in a transactional database where items have positive or negative unit profits is a computationally expensive task, and it is thus desirable to design more efficient algorithms. To address this issue, we propose an efficient algorithm named FHN (Faster High-Utility itemset miner with Negative unit profits). It relies on a novel PNU-list structure (Positive-and-Negative Utility-list) structure to efficiently mine high utility itemsets, while considering both positive and negative unit profits. Moreover, several pruning strategies are introduced in FHN to reduce the number of candidate itemsets, and thus enhance the performance of FHN. Extensive experimental results on both real-life and synthetic datasets show that the proposed FHN algorithm is in general two to three orders of magnitude faster and can use up to 200 times less memory than the state-of-the-art algorithm HUINIV-Mine. Moreover, it is shown that FHN performs especially well on dense datasets.

论文关键词:High-utility itemset mining,Transaction database,Negative unit profit,PNU-list,Pruning strategies

论文评审过程:Received 22 January 2016, Revised 21 July 2016, Accepted 25 August 2016, Available online 28 August 2016, Version of Record 23 September 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.08.022