Classification of incomplete data based on belief functions and K-nearest neighbors

摘要

It can be quite difficult to correctly and precisely classify the incomplete data with missing values, since the missing information usually causes ambiguities (uncertainty) in the classification result. Belief function theory can well model such uncertain and imprecise information, and a new belief-based method for credal classification of incomplete data (CCI) is proposed using the K nearest neighbors (KNNs) strategy. In CCI, the KNNs of object (incomplete data) are respectively used to estimate the missing values, and one can obtain K versions of edited pattern with estimated values from the KNNs. The K edited patterns are classified by any classical method to get K pieces of classification results with different discounting (weighting) factors depending on the distances between the object and its KNNs, and global fusion of the K classification results represented by the basic belief assignments (bba’s) is used for credal classification of the object. The conflicting beliefs produced in the fusion process can well capture the imprecision degree of classification, and it will be transferred to the selected meta-class defined by the disjunction of several classes (i.e. the set of several classes) according to the current context. Thus, the incomplete data that is hard to correctly classify because of the missing values will be reasonably committed to proper meta-class, which is able to characterize the imprecision of classification and reduce the errors as well. Three experiments are given to illustrate the potential and interest of CCI approach.