Incremental high utility pattern mining with static and dynamic databases

作者:Unil Yun, Heungmo Ryang

摘要

Pattern mining is a data mining technique used for discovering significant patterns and has been applied to various applications such as disease analysis in medical databases and decision making in business. Frequent pattern mining based on item frequencies is the most fundamental topic in the pattern mining field. However, it is difficult to discover the important patterns on the basis of only frequencies since characteristics of real-world databases such as relative importance of items and non-binary transactions are not reflected. In this regard, utility pattern mining has been considered as an emergent research topic that deals with the characteristics. In real-world applications, meanwhile newly generated data by continuous operation or data in other databases for integration analysis can be gradually added to the current database. To efficiently deal with both existing and new data as a database, it is necessary to reflect increased data to previous analysis results without analyzing the whole database again. In this paper, we propose an algorithm called HUPID-Growth (High Utility Patterns in Incremental Databases Growth) for mining high utility patterns in incremental databases. Moreover, we suggest a tree structure constructed with a single database scan named HUPID-Tree (High Utility Patterns in Incremental Databases Tree), and a restructuring method with a novel data structure called TIList (Tail-node Information List) in order to process incremental databases more efficiently. We conduct various experiments for performance evaluation with state-of-the-art algorithms. The experimental results show that the proposed algorithm more efficiently processes real datasets compared to previous ones.

论文关键词:Data mining, High utility patterns, Incremental mining, Frequent pattern mining, Static and dynamic databases

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-014-0601-6