Recognition of multi-interval rules in dataset with continuous-valued attributes

作者:

Highlights:

摘要

The decision tree induction learning approach in machine learning has been extensively applied to the field of knowledge management in practice, since it provides the advantages of quick learning and easy creation of explicit knowledge structures. However, the congenital limitation of node and branch structure may limit the success of decision tree induction learning in dealing with nominal or discrete-valued attributes. The need for discretizing or splitting a candidate continuous-valued attribute into some finite manageable number of intervals therefore is crucial in the application of decision tree induction learning. In this study, an integrated approach was proposed by utilizing the decision tree induction along with the hierarchical clustering analysis which combined the appropriate intervals based on the use of the proposed measure with considerations of both within attribute closeness and discretization results similarity. This proposed integrated approach facilitates the task of multi-interval discretization to produce more accurate classification rules by improving the processing difficulty of continuous-valued attributes for decision tree induction learning. Finally, the proposed approach was tested in terms of both the predictive accuracy and the size of the decision tree by using the UCI-ML testing databases.

论文关键词:Knowledge discovery in database,Machine learning,Knowledge management,Continuous-valued attribute

论文评审过程:Available online 8 December 2007.

论文官网地址:https://doi.org/10.1016/j.eswa.2007.11.042