Technical Note: Bias in Information-Based Measures in Decision Tree Induction

作者:Allan P. White, Wei Zhong Liu

摘要

A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.

论文关键词:Decision trees, noise, induction, unbiased attribute selection, information-based measures

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1022694010754