Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits

作者:Hua-Fu Li, Hsin-Yun Huang, Suh-Yin Lee

摘要

Mining utility itemsets from data steams is one of the most interesting research issues in data mining and knowledge discovery. In this paper, two efficient sliding window-based algorithms, MHUI-BIT (Mining High-Utility Itemsets based on BITvector) and MHUI-TID (Mining High-Utility Itemsets based on TIDlist), are proposed for mining high-utility itemsets from data streams. Based on the sliding window-based framework of the proposed approaches, two effective representations of item information, Bitvector and TIDlist, and a lexicographical tree-based summary data structure, LexTree-2HTU, are developed to improve the efficiency of discovering high-utility itemsets with positive profits from data streams. Experimental results show that the proposed algorithms outperform than the existing approaches for discovering high-utility itemsets from data streams over sliding windows. Beside, we also propose the adapted approaches of algorithms MHUI-BIT and MHUI-TID in order to handle the case when we are interested in mining utility itemsets with negative item profits. Experiments show that the variants of algorithms MHUI-BIT and MHUI-TID are efficient approaches for mining high-utility itemsets with negative item profits over stream transaction-sensitive sliding windows.

论文关键词:Data mining, Data streams, Utility mining, High-utility itemsets, Utility itemset with positive item profits, Utility itemset with negative item profits

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-010-0330-z