Using deep learning to preserve data confidentiality

作者:Wei Li, Pengqiu Meng, Yi Hong, Xiaohui Cui

摘要

Preserving data confidentiality is crucial when releasing microdata for public-use. There are a variety of proposed approaches; many of them are based on traditional probability theory and statistics. These approaches mainly focus on masking the original data. In practice, these masking techniques, despite covering part of the data, risk leaving sensitive data open to release. In this paper, we approach this problem using a deep learning-based generative model which generates simulation data to mask the original data. Generating simulation data that holds the same statistical characteristics as the raw data becomes the key idea and also the main challenge in this study. In particular, we explore the statistical similarities between the raw data and the generated data, given that the generated data and raw data are not obviously distinguishable. Two statistical evaluation metrics, Absolute Relative Residual Values and Hellinger Distance, are the evaluation methods we have decided upon to evaluate our results. We also conduct extensive experiments to validate our idea with two real-world datasets: the Census Dataset and the Environmental Dataset.

论文关键词:Data confidentiality, Generative model, Statistical evaluation metric

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-019-01515-3