Density-based microaggregation for statistical disclosure control

作者:

Highlights:

摘要

Protection of personal data in statistical databases has recently become a major societal concern. Statistical disclosure control (SDC) is often applied to statistical databases before they are released for public use. Microaggregation for SDC is a family of methods to protect microdata (i.e., records on individuals and/or companies) from individual identification. Microaggregation works by partitioning the microdata into groups of at least k records and, then, replacing the records in each group with the centroid of the group. An optimal microaggregation method must minimize the information loss resulting from this replacement process. However, this problem of minimizing information loss has been shown to be NP-hard for multivariate data. Methods based on various heuristics have been proposed for this problem, but none performs the best for every microdata set and various k values. This work presents a density-based algorithm (DBA) for microaggregation. The DBA first forms groups of records by the descending order of their densities, then fine-tunes these groups in reverse order. The performance of the DBA is compared against the latest microaggregation methods. Experimental results indicate that DBA incurs the least information loss for over half of the test situations.

论文关键词:Mircroaggregation,Disclosure control,k-Anonymity,Microdata protection

论文评审过程:Available online 27 September 2009.

论文官网地址:https://doi.org/10.1016/j.eswa.2009.09.054