Supporting set-valued joins in NoSQL using MapReduce

作者:

Highlights:

• We developed a set-similarity join solution in NoSQL using MapReduce.

• Our set-similarity join algorithm can avoid redundant comparisons between join attribute values in the MapReduce framework.

• We decreased substantially the amount of network traffic in the MapReduce framework.

• We reduced the number of comparisons to find all similar pairs by extending the prefix filtering technique for the MapReduce Framework.

• Our solution resulted in up to an order of magnitude improvement in performance over the most efficient existing solution.

摘要

Author-Highlights•We developed a set-similarity join solution in NoSQL using MapReduce.•Our set-similarity join algorithm can avoid redundant comparisons between join attribute values in the MapReduce framework.•We decreased substantially the amount of network traffic in the MapReduce framework.•We reduced the number of comparisons to find all similar pairs by extending the prefix filtering technique for the MapReduce Framework.•Our solution resulted in up to an order of magnitude improvement in performance over the most efficient existing solution.

论文关键词:Set-similarity join,MapReduce,Trie structure,Prefix filtering,NoSQL,Big data,Data mining

论文评审过程:Received 11 November 2014, Revised 17 November 2014, Accepted 18 November 2014, Available online 26 November 2014.

论文官网地址:https://doi.org/10.1016/j.is.2014.11.005