Succinct nearest neighbor search

作者:

Highlights:

摘要

An efficient and universal similarity search solution is a holy grail for multimedia information retrieval. Most similarity indexes work by mapping the original multimedia objects into simpler representations, which are then searched by proximity using a suitable distance function.In this paper we introduce a novel representation of the objects as sets of integers, with a distance that is computed using set operations. This allows us to use compressed inverted indexes, which have become a mature technology that excels in this type of search. Such indexes allow processing queries in main memory even for very large databases, so that the disk is accessed only to verify a few candidates and present the results.We show that our technique achieves very good speed/compression/recall tradeoffs. As an example, to reach 92% recall in 30-nearest neighbor searches, an index using 1 to 2.5 bytes per object inspects less than 0.6% of the actual objects. Furthermore, the ratio between the distances to the actual nearest neighbors and those reported by our algorithm is very close to 1.

论文关键词:Compressed metric indexes,Proximity search,Nearest neighbor search,Metric databases,Multimedia databases

论文评审过程:Available online 22 June 2012.

论文官网地址:https://doi.org/10.1016/j.is.2012.06.005