Dare to share: Protecting sensitive knowledge with data sanitization

作者:

Highlights:

摘要

Data sanitization is a process that is used to promote sharing of transactional databases among organizations while alleviating concerns of individual organizations by preserving confidentiality of their sensitive knowledge in the form of sensitive association rules. It hides the frequent itemsets corresponding to the sensitive association rules that contain sensitive knowledge by modifying the sensitive transactions that contain those itemsets. This process is guided by the need to minimize the impact on the data utility of the sanitized database by allowing mining as much as possible of the non-sensitive knowledge in the form non-sensitive association rules from the sanitized database. We propose three heuristic approaches for the sanitization problem. Results from extensive tests conducted on publicly available real datasets indicate that the approaches are effective and outperform a previous approach in terms of data utility at the expense of computational speed. The proposed approaches sanitize also the databases with great data accuracy, thus resulting in little distortion of the released databases. We recommend that the database owner sanitize the database using the third proposed hybrid approach.

论文关键词:Data mining,Sensitive knowledge protection,Data sanitization,Data utility

论文评审过程:Received 21 August 2005, Revised 17 August 2006, Accepted 21 August 2006, Available online 2 October 2006.

论文官网地址:https://doi.org/10.1016/j.dss.2006.08.007