Pivot-based approximate k-NN similarity joins for big high-dimensional data

作者:

Highlights:

• Study of approximate k-NN similarity joins for big high-dimensional data.

• Pivot-based k-NN join methods supporting various levels of approximation guarantee.

• Implementation and algorithm extensions with publicly available source code.

• Comprehensive experiments using high-dimensional data and popular Big Data systems.

摘要

•Study of approximate k-NN similarity joins for big high-dimensional data.•Pivot-based k-NN join methods supporting various levels of approximation guarantee.•Implementation and algorithm extensions with publicly available source code.•Comprehensive experiments using high-dimensional data and popular Big Data systems.

论文关键词:Hadoop,Spark,MapReduce,k-NN,Approximate similarity join,High-dimensional data

论文评审过程:Received 10 May 2018, Accepted 27 June 2019, Available online 2 July 2019, Version of Record 8 August 2019.

论文官网地址:https://doi.org/10.1016/j.is.2019.06.006