Region compatibility based stability assessment for decision trees

作者:

Highlights:

摘要

Decision tree learning algorithms are known to be unstable, because small changes in the training data can result in highly different decision trees. An important issue is how to quantify decision tree stability. Two types of stability are defined in the literature: structural and semantic stability. However, existing structural stability measures are meaningless when applied to apparently different decision trees, and semantic stability only focuses on prediction accuracy without considering structural information. This paper proposes a region compatibility based structural stability measure for decision trees that considers the structural distribution of leaves from the view of basic probability assignments in evidence theory. To the best of our knowledge, we are the first to use basic probability assignments to quantify decision tree stability. We prove convergence for region compatibility, and show that apparently different decision trees have some inherent similarity from the view of region compatibility. We also clarify the meaning of region compatibility for measuring decision tree stability, and derive a method to select a relatively stable learning algorithm for a given dataset. Experimental results validate that region compatibility is effective to quantify the stability of decision tree learning algorithms.

论文关键词:Machine learning,Decision tree,Stability measurement,Region compatibility,Evidence theory

论文评审过程:Received 31 May 2017, Revised 19 March 2018, Accepted 20 March 2018, Available online 23 March 2018, Version of Record 24 April 2018.

论文官网地址:https://doi.org/10.1016/j.eswa.2018.03.036