Pivot-based approximate k-NN similarity joins for big high-dimensional data

作者：

Highlights：

• Study of approximate k-NN similarity joins for big high-dimensional data.

• Pivot-based k-NN join methods supporting various levels of approximation guarantee.

• Implementation and algorithm extensions with publicly available source code.

• Comprehensive experiments using high-dimensional data and popular Big Data systems.

摘要

•Study of approximate k-NN similarity joins for big high-dimensional data.•Pivot-based k-NN join methods supporting various levels of approximation guarantee.•Implementation and algorithm extensions with publicly available source code.•Comprehensive experiments using high-dimensional data and popular Big Data systems.

论文关键词：Hadoop,Spark,MapReduce,k-NN,Approximate similarity join,High-dimensional data

论文评审过程：Received 10 May 2018, Accepted 27 June 2019, Available online 2 July 2019, Version of Record 8 August 2019.

论文官网地址：https://doi.org/10.1016/j.is.2019.06.006