Intrinsic dimension estimation method based on correlation dimension and kNN method

作者:

Highlights:

摘要

In practical problems, high-dimensional data usually has a low-dimensional structure, or the data is located on a low-dimensional manifold. The dimension of this manifold is called the intrinsic dimension of the data. There are many intrinsic dimension estimation methods, among which methods based on the correlation dimension have received extensive attention. However, correlation dimension-based estimation methods often provide a dimension lower than the true intrinsic dimension of the dataset. To explore the reasons behind underestimation, the probabilities of underestimation, overestimation and proper estimation are analyzed using order statistics. The analysis results show that the probability of underestimation is much higher than that of the other two cases, and is verified by simulation experiments. Based on the above analysis, a new method for the estimation of the intrinsic dimension is proposed based on the correlation dimension and k-nearest neighbor method (kNN), which effectively reduces the underestimation. This method is implemented using two algorithms, namely a search algorithm and a matching algorithm. Comprehensive experimental studies on simulation datasets and real datasets show that the proposed algorithms are more effective than the comparison methods.

论文关键词:Intrinsic dimension,Order statistics,Estimation method,Correlation dimension,k-Nearest Neighbor (kNN)

论文评审过程:Received 21 February 2021, Revised 18 October 2021, Accepted 19 October 2021, Available online 22 October 2021, Version of Record 6 November 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107627