Access cost estimation for physical database design

作者:

Highlights:

摘要

In this work we propose models for access cost estimation that are suitable in the physical design of a relational database when a set of secondary indexes has to be built on some attributes of the relations. The models are tailored to deal with distinct kinds of queries (partial-match, interval, join, etc.) and are based on a measure of association, the clustering factor, which applies between an attribute and the physical location of records in a file as well as between two (sets of) attributes. The use of clustering factors and the value selectivity of a query (i.e. how many distinct values satisfy a query) allow design time models to be derived without previously needing to estimate the record selectivities (i.e. how many records satisfy a query) or the corresponding access costs of all the query instances that can occur at run time. In practice, unlike previous approaches to the problem, run time models are derived by specializing design time models, rather than vice versa. Estimation of access costs with alternative ordering criteria is also considered, and a model is proposed that allows the primary attribute to be chosen wothout the need to sort the tuples. The proposed models achieve a good tradeoff between accuracy and simplicity, without being based on restrictive assumptions as to data, and easily allow the design process to take advantage of semantic information about the application domain even if the data are not yet loaded in the database.

论文关键词:Relational databases,physical design,cost models,clustering factor,attribute dependencies

论文评审过程:Received 27 July 1992, Accepted 26 May 1993, Available online 13 February 2003.

论文官网地址:https://doi.org/10.1016/0169-023X(93)90002-7