Differentially private data publishing via optimal univariate microaggregation and record perturbation

作者:

Highlights:

摘要

We present an approach to generate differentially private data sets that consists in adding noise to a microaggregated version of the original data set. While this idea has already been pursued in the literature to reduce the sensitivity of attributes and hence the noise required to reach differential privacy, the novelty of our approach is that we focus on the microaggregated data set as our protection target (rather than aiming at protecting the original data set and viewing the microaggregated data set as a mere intermediate step). Interestingly, by starting from the microaggregated data set rather than the original data set, we achieve differential privacy for the individuals having contributed the original records while preserving substantially more utility. Compared with previous contributions using microaggregation as a prior step to reach differential privacy, the utility improvement comes from avoiding the need to use insensitive microaggregation. This claim is supported by theoretical and empirical utility comparisons between our approach and existing approaches. We analyze several microaggregation strategies: multivariate MDAV, individual-ranking MDAV, and optimal microaggregation. In particular, we reformulate optimal microaggregation to fit it to the generation of differentially private data sets.

论文关键词:Differential privacy,Microaggregation,Anonymization,Statistical disclosure control,Privacy

论文评审过程:Received 22 June 2017, Revised 20 April 2018, Accepted 21 April 2018, Available online 24 April 2018, Version of Record 11 May 2018.

论文官网地址:https://doi.org/10.1016/j.knosys.2018.04.027