FastRCA-Seq: An efficient approach for extracting hierarchies of multilevel closed partially-ordered patterns

作者:

Highlights:

摘要

Discovering concise representations of sequential patterns in sequential data is a well-established data mining task. Recently, Nica et al. have put forward an original approach RCA-Seq for directly extracting a hierarchy of multilevel closed partially-ordered patterns (MCPO-patterns) from a sequence database within the Relational Concept Analysis (RCA) framework. RCA-Seq has been applied successfully to small (∼1,000 sequences) but interesting real hydro-ecological datasets. However, RCA-Seq only focuses on providing comprehensible results to the detriment of performance. To improve the performance of RCA-Seq, we propose a new approach FastRCA-Seq that stems from RCA-Seq, and whose contributions are beneficial for two fields: Formal Concept Analysis, namely the RCA extension, and sequential pattern mining. FastRCA-Seq spans two key steps: the exploration of sequential data based on RCA, and the extraction of MCPO-patterns by navigating the RCA result. Firstly, our approach introduces an effective RCA implementation based on bit-array representations, bitwise operations, parallel computing, and several new properties of RCA that may prevent expensive computations. In addition, we state the bottleneck of RCA. Secondly, FastRCA-Seq is a self-contained approach for directly and efficiently mining hierarchies of MCPO-patterns from sequential data. We assess FastRCA-Seq on various benchmark datasets, precisely Gazelle, Kosarak, and FIFA. The results show that FastRCA-Seq outperforms RCA-Seq in terms of execution time (in average ∼169 times faster) and memory usage (in average with ∼42% less) while preserving the benefits of interpretability and usability of results by stakeholders.

论文关键词:Sequential data,Relational Concept Analysis,Multilevel closed partially-ordered patterns,Hierarchy of patterns,Performance

论文评审过程:Received 25 January 2020, Revised 10 August 2020, Accepted 12 October 2020, Available online 16 October 2020, Version of Record 16 October 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.106533