A hybrid approach for scalable sub-tree anonymization over big data using MapReduce on cloud

作者:

Highlights:

• We investigate the scalability problem of sub-tree anonymization of big data in cloud.

• A hybrid approach containing top–down specialization and bottom–up generalization.

• Innovative MapReduce jobs are designed for computation in bottom–up generalization.

• Multiple generalizations are performed simultaneously in each iteration for scalability.

• Data skewness is integrated when estimating the workload balancing point.

摘要

•We investigate the scalability problem of sub-tree anonymization of big data in cloud.•A hybrid approach containing top–down specialization and bottom–up generalization.•Innovative MapReduce jobs are designed for computation in bottom–up generalization.•Multiple generalizations are performed simultaneously in each iteration for scalability.•Data skewness is integrated when estimating the workload balancing point.

论文关键词:Big data,Cloud computing,Data anonymization,Privacy preservation,MapReduce

论文评审过程:Received 25 September 2012, Revised 15 March 2013, Accepted 27 August 2013, Available online 11 February 2014.

论文官网地址:https://doi.org/10.1016/j.jcss.2014.02.007