Handling data skew in join algorithms using MapReduce

作者:

Highlights:

• We introduce a skew handling algorithm, called multi-dimensional range partitioning.

• The proposed algorithm is more efficient than traditional MapReduce-based join algorithms.

• The proposed algorithm is scalable regardless of the size of input data.

摘要

•We introduce a skew handling algorithm, called multi-dimensional range partitioning.•The proposed algorithm is more efficient than traditional MapReduce-based join algorithms.•The proposed algorithm is scalable regardless of the size of input data.

论文关键词:MapReduce,Join algorithm,Skew handling,Multi-dimensional range partitioning

论文评审过程:Received 7 February 2015, Revised 26 October 2015, Accepted 21 December 2015, Available online 6 January 2016, Version of Record 23 January 2016.

论文官网地址:https://doi.org/10.1016/j.eswa.2015.12.024