Fast clustering-based anonymization approaches with time constraints for data streams

作者:

Highlights:

摘要

Research on the anonymization of static data has made great progress in recent years. Generalization and suppression are two common technologies for quasi-identifiers’ anonymization. However, the characteristics of data streams, such as potential infinity and high dynamicity, make the anonymization of data streams different from the anonymization of static data. The methods for static data anonymization cannot be directly applied to anonymizing data streams. In this paper, a novel k-anonymization approach for data streams based on clustering is proposed. In order to speed up the anonymization process and reduce the information loss, the new approach scans a stream in one turn to recognize and reuse the clusters satisfying the k-anonymity principle. The time constraints on tuple publication and cluster reuse, which are specific to data streams, are considered as well. Furthermore, the approach is improved to conform to the ℓ-diversity principle. The experiments conducted on the real datasets show that the proposed methods are both efficient and effective.

论文关键词:Anonymization,Clustering,Data stream,Generalization,Suppression

论文评审过程:Received 6 May 2012, Revised 16 February 2013, Accepted 15 March 2013, Available online 26 March 2013.

论文官网地址:https://doi.org/10.1016/j.knosys.2013.03.007