Multi-core parallel algorithms for hiding high-utility sequential patterns

作者:

Highlights:

摘要

High-utility sequential pattern mining (HUSPM) can be applied in many applications such as retail, market basket analysis, click-stream analysis, healthcare data analysis, and bioinformatics. HUSPM algorithms discover useful information from data. However, looking at the dark side, the sensitive patterns can also be disclosed by the competitors, who use a HUSPM algorithm on the leaked data. Therefore, high-utility sequential pattern hiding (HUSPH) is used to protect the privacy information from HUSPM algorithms. This paper proposes three algorithms named High Utility Sequential Pattern Hiding Using Pure Array Structure (USHPA), High Utility Sequential Pattern Hiding Using Parallel Strategy (USHP), and High Utility Sequential Pattern Hiding Using Random Distribution Strategy (USHR) for hiding high-utility sequential patterns on quantitative sequence datasets. These algorithms use a proposed data structure named Pattern Utility Set for Hiding (PUSH) to speed up the hiding process. We also introduce a metric called Privacy Factor to evaluate the quality of hiding results. The comparative experiments were conducted on real datasets to evaluate the performance of the proposed algorithms in terms of runtime, memory consumption, scalability, missing cost, and privacy factor. Results show that the proposed algorithms can efficiently sanitize the input datasets, and they outperform the compared algorithms for all metrics.

论文关键词:High-utility sequential pattern mining,High-utility sequential pattern hiding,Parallel hiding,Privacy preserving utility mining

论文评审过程:Received 31 December 2020, Revised 19 November 2021, Accepted 20 November 2021, Available online 3 December 2021, Version of Record 21 December 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107793