Efficient indexing of high-dimensional data through dimensionality reduction

作者:

Highlights:

摘要

The performance of the R-tree indexing method is known to deteriorate rapidly when the dimensionality of data increases. In this paper, we present a technique for dimensionality reduction by grouping d distinct attributes into k disjoint clusters and mapping each cluster to a linear space. The resulting k-dimensional space (which may be much smaller than d) can then be indexed using an R-tree efficiently. We present algorithms for decomposing a query region on the native d-dimensional space to corresponding query regions in the k-dimensional space, as well as search and update operations for the “dimensionally-reduced” R-tree. Experiments using real data sets for point, region, and OLAP queries were conducted. The results indicate that there is potential for significant performance gains over a naive strategy in which an R-tree index is created on the native d-dimensional space.

论文关键词:R-trees,Dimensionally-reduced R-trees,Hilbert curve,High dimensional space,Linear space

论文评审过程:Received 4 November 1998, Revised 21 July 1999, Accepted 23 July 1999, Available online 29 November 1999.

论文官网地址:https://doi.org/10.1016/S0169-023X(99)00031-2