Efficient approach of sliding window-based high average-utility pattern mining with list structures

作者:

Highlights:

摘要

Data mining has been actively studied, and it has become more important due to the development of information technology and the demands of diverse applications, such as the retail market, medical field, and manufacturing plants. High average-utility pattern mining is a type of data mining that finds valuable patterns by dividing the utility of a pattern by the length of the pattern. It considers the length of the pattern, so it can handle the demerits of utility pattern mining and extract more important patterns. Meanwhile, stream data is produced in real time, and the importance of the recent data gradually increases as opposed to the old data. However, most of the previous high average utility pattern mining approaches cannot find the latest valuable patterns over the data streams. Although research using a sliding window model was conducted to handle only the recent data, it has a problem due to the data structure. Since the previous work is a tree-based approach, numerous candidate patterns are generated, and an additional database scan is required to calculate the actual utilities of candidates. Creating the candidate patterns reduces the efficiency of runtime and memory and makes real-time data processing difficult. We propose an efficient algorithm that uses a list structure for sliding window-based high average-utility pattern mining in this paper. The proposed algorithm scans the stream data once, mines the complete set of the recent high average-utility patterns without generating candidates, and reduces the search space by using a novel pruning technique. The experiment results show that the proposed algorithm outperforms the state-of-the-art algorithms in terms of the runtime, memory, and scalability in real and synthetic datasets.

论文关键词:Association rule mining,Sliding window,Stream mining,High average-utility pattern mining

论文评审过程:Received 3 January 2022, Revised 11 August 2022, Accepted 12 August 2022, Available online 22 August 2022, Version of Record 13 September 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.109702