On the optimal ordering of multiple-field tables

作者:

Highlights:

摘要

Consider a table of nd-dimensional records, grouped into b buckets, and a set Q = {(Qh, wh)} of weighed partial-match conditions, where wh is the relative frequency of Qh. Let n(Qh) be the number of records which satisfy Qh, and b(Qh) the number of buckets in which these records are found. The problem we consider is the individuation of the optimal ordering field which minimizes the sum of accessed buckets, B(Q) = Σhwh × b(Qh). An exact solution requires the records to be sorted according to the values of each of the d fields in turn with an overall time complexity, evaluated as a function of bucket accesses, of O(d × b log b). We present an approximate O(b) algorithm which estimates the optimal ordering without the need to sort the table at all. The algorithm makes use of a probabilistic counting technique, known as linear counting, which requires a single scan of the table. Experimental results show that in most cases the approximate solution agrees with the exact one.

论文关键词:Relational databases,Physical design,Cost models,Linear counting,Optimal ordering

论文评审过程:Received 4 February 1994, Revised 16 May 1994, Accepted 25 July 1994, Available online 11 February 2003.

论文官网地址:https://doi.org/10.1016/0169-023X(94)90007-8