Don't care values in induction

作者:

Highlights:

摘要

Inductive learning algorithms are powerful tools for the extraction of knowledge from data. Their success in medical domains is well-known. In medical diagnosis domains and generally in real-world applications among other problems, inductive learning algorithms have to deal with unknown values. In most cases unknown values are treated as missing ones. i.e. unknown values which are related to the class of training examples, but are missing due to lack of measurements. In this paper we address the problem of don't care values, which are unknown, because they are irrelevant to the class of the examples. The distinction of don't care values and missing ones is important in medical domains. With this distinction the experts are able to relate each diagnosis to the appropriate subset of attributes. We present techniques for dealing efficiently with don't care values in the induction of decision trees. Furthermore, we examine the importance of the distinction between missing and don't care values and we investigate the existence of don't care values instead of missing ones, in medical and non-medical real-world datasets.

论文关键词:Don't care values,Unknown values,Inductive learning,Decision trees,Medical diagnosis

论文评审过程:Received 19 February 1995, Accepted 6 May 1996, Available online 23 March 1999.

论文官网地址:https://doi.org/10.1016/S0933-3657(96)00357-0