A less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification

作者:

Highlights:

摘要

The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct near-optimal decision trees. Most of them, however, are greedy algorithms that have the drawback of obtaining only local optimums. Besides, conventional split criteria they used, e.g. Shannon entropy, Gain Ratio and Gini index, are based on one-term that lack adaptability to different datasets. To address the above issues, we propose a less-greedy two-term Tsallis Entropy Information Metric (TEIM) algorithm with a new split criterion and a new construction method of decision trees. Firstly, the new split criterion is based on two-term Tsallis conditional entropy, which is better than conventional one-term split criteria. Secondly, the new tree construction is based on a two-stage approach that reduces the greediness and avoids local optimum to a certain extent. The TEIM algorithm takes advantages of the generalization ability of two-term Tsallis entropy and the low greediness property of two-stage approach. Experimental results on UCI datasets indicate that, compared with the state-of-the-art decision trees algorithms, the TEIM algorithm yields statistically significantly better decision trees and is more robust to noise.

论文关键词:Decision trees,Attribute split criterion,Tree construction,Classification

论文评审过程:Received 19 January 2016, Revised 14 December 2016, Accepted 20 December 2016, Available online 21 December 2016, Version of Record 15 February 2017.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.12.021