A Hellinger-based discretization method for numeric attributes in classification learning

作者:

Highlights:

摘要

Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. Our method is context-sensitive in the sense that it takes into account the value of the target attribute. The amount of information each interval gives to the target attribute is measured using Hellinger divergence, and the interval boundaries are decided so that each interval contains as equal amount of information as possible. In order to compare our discretization method with some current discretization methods, several popular classification data sets are selected for discretization. We use naive Bayesian classifier and C4.5 as classification tools to compare the accuracy of our discretization method with that of other methods.

论文关键词:Machine learning,Discretization,Data mining,Knowledge discovery

论文评审过程:Received 14 February 2004, Accepted 3 June 2006, Available online 20 October 2006.

论文官网地址:https://doi.org/10.1016/j.knosys.2006.06.005