Ground truth bias in external cluster validity indices

作者:

Highlights:

• We identify the GT bias effect for external validation measures, and explain its importance.

• We test and discuss NC bias for 26 popular pair-counting based validation measures.

• We prove that the RI and related 4 indices suffer from GT bias.

• We provide theoretical explanations for understanding why and when GT bias happens.

• We present experimental results that support our analysis.

• We present an empirical example to show that the ARI also suffers from a modified GT bias.

摘要

•We identify the GT bias effect for external validation measures, and explain its importance.•We test and discuss NC bias for 26 popular pair-counting based validation measures.•We prove that the RI and related 4 indices suffer from GT bias.•We provide theoretical explanations for understanding why and when GT bias happens.•We present experimental results that support our analysis.•We present an empirical example to show that the ARI also suffers from a modified GT bias.

论文关键词:External cluster validity indices,Rand index,Ground truth bias,Quadratic entropy

论文评审过程:Received 18 June 2016, Revised 15 October 2016, Accepted 4 December 2016, Available online 8 December 2016, Version of Record 23 December 2016.

论文官网地址:https://doi.org/10.1016/j.patcog.2016.12.003