Hiding sensitive itemsets without side effects

作者:Surendra H, Mohan H S

摘要

Data mining techniques are being used to discover useful patterns hidden in the data. However, these data mining techniques also extract sensitive information posing a threat to privacy. Frequent Itemset mining is a widely used data mining technique and a pre-processing step for Association Rule Mining. These frequent itemsets may contain sensitive itemsets which need to be hidden from adversaries. Traditional data sanitization techniques modify transactions in the database to hide sensitive itemsets which suffer from undesired side effects and information loss. In this paper, we propose a pattern sanitization approach to hide sensitive itemsets for privacy preserved pattern sharing. The transactional database is modeled as a set of lossless compact patterns using Closed Itemsets. The novelty of the proposed technique is in sanitizing the closed itemsets/patterns instead of transactions in the database. The proposed Recursive Pattern Sanitization (RPS) algorithm hides multiple sensitive itemsets irrespective of their size and support in single parse of the closed patterns. The patterns in the sanitized model retain the closeness property, and the model has inherent support for finding frequent itemsets and association rules reducing mining activity by the end user. Experimental results show that the proposed approach is effective in hiding sensitive itemsets without side effects and unexpected information loss compared to other well-known transaction modification based itemset hiding techniques.

论文关键词:Data sanitization, Itemset hiding, Pattern sanitization, Privacy-preserving data mining (PPDM), Privacy-preserving data publishing (PPDP)

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-018-1329-5