Minmax Circular Sector Arc for External Plagiarism’s Heuristic Retrieval stage

作者:

Highlights:

• Locality-sensitive hashing algorithms for nearest search problem are proposed.

• The algorithms represent sketches of documents as unique numeric values.

• The algorithms reduce hashing and retrieval time by 50% and 33%, respectively.

• The number of permutations should be strictly controlled to obtain the desired recalls.

摘要

•Locality-sensitive hashing algorithms for nearest search problem are proposed.•The algorithms represent sketches of documents as unique numeric values.•The algorithms reduce hashing and retrieval time by 50% and 33%, respectively.•The number of permutations should be strictly controlled to obtain the desired recalls.

论文关键词:External Plagiarism,Heuristic Retrieval,Locality-sensitive hashing,High-dimensional spaces,Pattern clustering,Approximate nearest neighbor search,Hashing method,Hashing time reduction,Min–max hash method,Pairwise Jaccard similarity estimation,Scalable similarity search,Approximation algorithms,Computational efficiency,Nearest neighbor searches,Jaccard similarity

论文评审过程:Received 4 January 2017, Revised 28 June 2017, Accepted 12 August 2017, Available online 18 August 2017, Version of Record 18 October 2017.

论文官网地址:https://doi.org/10.1016/j.knosys.2017.08.013