Efficient sanitization of informative association rules

摘要

Recent development in privacy-preserving data mining has proposed many efficient and practical techniques for hiding sensitive patterns or information from been discovered by data mining algorithms. In hiding association rules, current approaches require hidden rules or patterns to be given in advance. In addition, for Apriori algorithm based techniques [Verykios, V., Elmagarmid, A., Bertino, E., Saygin, Y., & Dasseni, E. (2004). Association rules hiding. IEEE Transactions on Knowledge and Data Engineering, 16(4) 434–447], multiple scanning of the entire database is required. For direct sanitization of itemsets from transaction techniques [Oliveira, S., & Zaiane, O. (2003). An efficient on-scan sanitization for improving the balance between privacy and knowledge discovery. Technical report TR 03-15, Department of Computing Science, University of Alberta, Canada], one scanning of each window in the database is processed independently. However, the accumulated information among windows is not considered. In this work, we propose an efficient one database scanning sanitization algorithm to sanitize informative association rules. For a given predicting item, an informative association rule set [Li, Jiuyong, Shen, Hong, & Topor, Rodney. (2001). Mining the smallest association rule set for predictions, In Proceedings of the 2001 IEEE international conference on data mining (pp. 361–368)] is the smallest association rule set that makes the same prediction as the entire association rule set by confidence priority. A new data structure called pattern-inversion tree is proposed to store related information so that only one scan of database is required. The pre-process of finding these informative association rules can be integrated into the sanitization process. Numerical experiments show that the performance of the proposed algorithm is more efficient than previous algorithms with similar side effects. Running time complexity of the algorithm is presented and compared to similar algorithm with better complexity.