Gradient estimation of information measures in deep learning

作者:

Highlights:

摘要

Information measures including entropy and mutual information (MI) have been widely applied in deep learning. Despite the successes, exiting estimation methods suffer from either high variance or high bias. This may lead to unstable training or poor performance in deep learning. Since estimating information measures in themselves is very difficult, we explore an alternative appealing strategy, by directly estimating the gradients of information measures with respect to model parameters. We propose a general gradient estimation method for information measures based on the score estimation. In detail, we establish the Entropy Gradient Estimator (EGE) and the Mutual Information Gradient Estimator (MIGE) to estimate the gradient of entropy and mutual information with respect to model parameters, respectively. For dealing with the optimization of entropy and mutual information, we can directly plug in their gradient approximation with relevant parameters to enable stochastic backpropagation for stability and efficiency. Our proposed method exhibits higher accuracy and lower variance for gradient estimation of information measures. Extensive experiments on various deep learning tasks have demonstrated the superiority of our method.

论文关键词:Entropy,Mutual information,Score estimation,Gradient estimation

论文评审过程:Received 19 December 2020, Revised 10 April 2021, Accepted 12 April 2021, Available online 27 April 2021, Version of Record 29 April 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107046