Disclosure risk reduction for generalized linear model output in a remote analysis system

作者:

Highlights:

摘要

Remote analysis systems allow analysts to obtain statistical results without providing direct access to confidential data stored in a secure server system. An attacking analyst could send queries to a remote server to obtain outputs of statistical analyses and use those outputs for a disclosure attack. Statistical disclosure control (SDC) methods are used to modify remote analysis system (RAS) outputs in the protection of confidential information. Confidentiality protection through perturbation is one of the most commonly adopted SDC methods. In the case of generalized linear modelling, random noise is added to the estimated coefficients or to the associated estimating equation prior to getting estimates. This inflates the variances of estimators, and some efficiency and utility of estimators are lost. Thus the application of any perturbation based SDC method could result in an inefficient estimator, with the danger of producing worthless inferences. To date, little attention has been given to systematically controlling the disclosure risk and utility in SDC methods for RAS. In this paper, we develop a framework for the perturbation of estimating equations that enables an RAS to release modified generalized linear model output in such a way that the disclosure risk is not only reduced but also a good utility is maintained. Finally, we present some empirical results demonstrating the application of our framework for obtaining estimates from perturbed estimating equations of binary and count response models.

论文关键词:Confidentiality protection,Data utility,Efficiency loss,Generalized linear model,Perturbation,Remote analysis

论文评审过程:Received 13 August 2016, Revised 16 June 2017, Accepted 25 July 2017, Available online 31 July 2017, Version of Record 20 September 2017.

论文官网地址:https://doi.org/10.1016/j.datak.2017.07.009