Comparison of microaggregation approaches on anonymized data quality

作者：

Highlights：

•

摘要

Microaggregation is commonly used to protect microdata from individual identification by anonymizing dataset records such that the resulting dataset (called the anonymized dataset) satisfies the k-anonymity constraint. Since this anonymizing process degrades data quality, an effective microaggregation approach must ensure the quality of the anonymized dataset so that the anonymized dataset remains useful for further analysis. Therefore, the performance of a microaggregation approach should be measured by the quality of the anonymized dataset generated by the microaggregation approach. Previous studies often refer to the quality of an anonymized dataset as information loss. This study takes a different approach. Since an anonymized dataset should support further analysis, this study first builds a classifier from the anonymized dataset, and then uses the prediction accuracy of that classifier to represent the quality of the anonymized dataset. Performance results indicate that low information loss does not necessarily translate into high prediction accuracy, and vice versa. This is particularly true when the information losses of both anonymized datsets do not differ significantly.

论文关键词：Mircroaggregation,Disclosure control,k-Anonymity,Information loss

论文评审过程：Available online 2 June 2010.

论文官网地址：https://doi.org/10.1016/j.eswa.2010.05.071