Multiplicative distance: a method to alleviate distance instability for high-dimensional data

作者:Jafar Mansouri, Morteza Khademi

摘要

Recently, it has been shown that under a broad set of conditions, the commonly used distance functions will become unstable in high-dimensional data space; i.e., the distance to the farthest data point approaches the distance to the nearest data point of a given query point with increasing dimensionality. It has been shown that if dimensions are independently distributed, and normalized to have zero mean and unit variance, instability happens. In this paper, it is shown that the normalization condition is not necessary, but all appropriate moments must be finite. Furthermore, a new distance function, namely multiplicative distance, is introduced. It is theoretically proved that this function is stable for data with independent dimensions (with identical or nonidentical distribution). In contrast to usual distance functions which are based on the summation of distances over all dimensions (distance components), the multiplicative distance is based on the multiplication of distance components. Experimental results show the stability of the multiplicative distance for data with independent and correlated dimensions in the high-dimensional space and the superiority of the multiplicative distance over the norm distances for the high-dimensional data.

论文关键词:Distance instability, High-dimensional data, Minkowski and fractional norms, Multiplicative and additive distances

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-014-0813-4