On metricity of two heterogeneous measures in the presence of missing values

作者:Martti Juhola, Jorma Laurikkala

摘要

Heterogeneous Euclidean-overlap metric and heterogeneous value difference metric given in machine learning literature are useful for the consideration of mixed-type data for machine learning, pattern recognition and data mining tasks. Mixed-type variables are quite common in practical problems, but this property has been taken into account only seldom in pattern recognition, data mining and decision making algorithms. We observed that these two distance measures are not actually metrics after having found a special situation when they are not metric, but pseudometric, a feature to be noted while using them. Nevertheless, by changing their definitions somewhat, it is possible to meet the metricity. Especially in medical applications, the redefinition of the two measures might be important, since otherwise it is possible in theory that, for example, two identical cases would be classified differently. Nearest neighbor searching tests with medical data were run to illustrate the behavior of these measures. Notwithstanding the violation of the metricity their original forms yielded slightly better classification results. The reason was that in real data sets tested there were very few almost similar cases according to these distance measures, and the original forms based on more separating distances than the redefinitions were slightly better in the classification.

论文关键词:Metric, Distance, Mixed-type variables, Missing values, Medical data

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10462-009-9096-7