Hiding outliers into crowd: Privacy-preserving data publishing with outliers

作者:

Highlights:

摘要

In recent years, many organizations publish their data in non-aggregated format for research purpose. However, publishing non-aggregated data raises serious concerns in data privacy. One of the concerns is that when outliers exist in the dataset, they are easier to be distinguished from the crowd and their privacy is prone to be compromised. In this paper, we study the problem of privacy-preserving publishing datasets that contain outliers. We define the distinguishability-based attack by which the adversary can identify outliers and reveal their private information from an anonymized dataset. We show that the existing syntactic privacy models (e.g., k-anonymity and ℓ-diversity) cannot defend against the distinguishability-based attack. We define the plain ℓ-diversity to provide privacy guarantee to outliers against the distinguishability-based attack, and design efficient algorithms to anonymize the dataset to achieve plain ℓ-diversity with low information loss. We extend our anonymization approach to deal with continuous release of a series of datasets that contain outliers. Our experiments demonstrate the efficiency and effectiveness of our approaches.

论文关键词:Security,Integrity and protection,Data sharing,Data anonymization,Outliers

论文评审过程:Received 16 August 2013, Revised 8 May 2015, Accepted 30 June 2015, Available online 7 July 2015, Version of Record 10 November 2015.

论文官网地址:https://doi.org/10.1016/j.datak.2015.06.012