Using TF-IDF to hide sensitive itemsets

作者:Tzung-Pei Hong, Chun-Wei Lin, Kuo-Tung Yang, Shyue-Liang Wang

摘要

Data mining technology helps extract usable knowledge from large data sets. The process of data collection and data dissemination may, however, result in an inherent risk of privacy threats. Some sensitive or private information about individuals, businesses and organizations needs to be suppressed before it is shared or published. The privacy-preserving data mining (PPDM) has thus become an important issue in recent years. In this paper, we propose an algorithm called SIF-IDF for modifying original databases in order to hide sensitive itemsets. It is a greedy approach based on the concept borrowed from the Term Frequency and Inverse Document Frequency (TF-IDF) in text mining. The above concept is used to evaluate the similarity degrees between the items in transactions and the desired sensitive itemsets and then selects appropriate items in some transactions to hide. The proposed algorithm can easily make good trade-offs between privacy preserving and execution time. Experimental results also show the performance of the proposed approach.

论文关键词:Privacy preserving, Data mining, TF-IDF, Greedy approach, Data sanitization

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-012-0377-5