Select-partitioned join: An improved partition-based join algorithm

作者:

Highlights:

摘要

We propose a new partition-based join algorithm, called select-partitioned join, which performs better than the sort-merge and hash-partitioned join. The proposed select-partitioned join algorithm consists of three major steps. The first step is to determine a partitioning pattern by which the total join cost can be minimized and choose the bound values of the buckets by using a selection algorithm. The second is to partition both relations into ranged buckets according to the partitioning pattern and the bound values chosen in the previous step. And the last step is to apply the nested-block join on the partitioned bucket pairs. The selection algorithm is based on the cumulative distribution function and it is performed by a single scan of the smaller relation. The performance of the select-partitioned join is analyzed in terms of the number of I/Os and compared with the sort-merge and hash-partitioned join algorithms. Our join algorithm is better than a hash-partitioned join algorithm, which is, in general, known to be better for the join operation. Simulation experiments are conducted for the join algorithms.

论文关键词:Join algorithm,relational database,selection algorithm,cumulative distribution function

论文评审过程:Received 30 October 1989, Revised 27 November 1990, Available online 17 June 2003.

论文官网地址:https://doi.org/10.1016/0306-4379(91)90015-2