New approaches for mining regular high utility sequential patterns

作者:Sabrina Zaman Ishita, Chowdhury Farhan Ahmed, Carson K. Leung

摘要

Regular pattern mining has been emerged as one of the promising sub-domains of data mining by discovering patterns with regular occurrences throughout a complete database. In contrast, utility-based pattern mining considers non-binary frequencies of items along with their importance values, and hence reveals more significance than traditional frequent pattern mining. Though regular patterns carry interesting knowledge, considering the utility values of the patterns would unveil more interesting and practical information. In sequence databases, the task of mining regular high utility patterns is more useful and challenging. In the recent time of big data, handling the incremental nature of databases to avoid mining from scratch when new updates appear, will bring effective results in a lot of applications. Moreover, databases can be dynamically updated in the form of data streams where new batches of data are added to the database at a higher rate. A window consisting of several recent batches can be of great interest to some end-users. To address all these important problems, here, we first introduce the concept of regular high utility sequential patterns and develop an algorithm for mining these patterns from static databases. Afterwards, we extend our algorithm to mine regular high utility sequential patterns from incremental databases and sliding-window based data streams. These two approaches produce approximate results in order to generate our intended patters faster and thus boost the performance. Extensive performance analyses of all the algorithms are observed over several real-life datasets and impressive results are found compared to the existing research.

论文关键词:Data mining, Regular pattern, High utility pattern, Sequential pattern, Incremental databases, Data streams

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02536-7