Estimating the size of a relational join

作者:

Highlights:

摘要

This paper shows how to esimtate the size of the natural join of two relations. Such an estimate is valuable in query optimization as joins constitute the bulk of the work required in answering multi-relation queries. The final optimized strategy can be adjusted if an accurate estimate of the size of the resulting join can be efficiently and reliably determined. The estimate is also valuable in distributed systems. Such systems may do an operation called a semijoin. A reliable estimate of the size of the final joined table gives sufficient information to determine whether the semijoin will provide a net gain or net loss in the amount of work required to do the join. The estimation method is based on partial Bloom filters. Bloom filters or segments of filters are prepared from the join attributes of the relations being joined. It is possible to estimate the size of the joined table from the size of the bitwise intersection of the two filters. Estimates of the average number of replications of the join attribute in a relation are also available when building the filters.

论文关键词:Join,distributed,estimate,Bloom filter

论文评审过程:Received 22 November 1991, Revised 24 February 1993, Available online 10 June 2003.

论文官网地址:https://doi.org/10.1016/0306-4379(93)90037-2