The randomized information coefficient: assessing dependencies in noisy data

作者:Simone Romano, Nguyen Xuan Vinh, Karin Verspoor, James Bailey

摘要

When differentiating between strong and weak relationships using information theoretic measures, the variance plays an important role: the higher the variance, the lower the chance to correctly rank the relationships. We propose the randomized information coefficient (RIC), a mutual information based measure with low variance, to quantify the dependency between two sets of numerical variables. We first formally establish the importance of achieving low variance when comparing relationships using the mutual information estimated with grids. Second, we experimentally demonstrate the effectiveness of RIC for (i) detecting noisy dependencies and (ii) ranking dependencies for the applications of genetic network inference and feature selection for regression. Across these tasks, RIC is very competitive over other 16 state-of-the-art measures. Other prominent features of RIC include its simplicity and efficiency, making it a promising new method for dependency assessment.

论文关键词:Dependency measures, Noisy relationships, Normalized mutual information, Randomized ensembles

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10994-017-5664-2