Statistical treatment of the information content of a database

作者：

Highlights：

•

摘要

The statistical analysis of the database contents is usually performed by using software packages, that require the numerical coding of database attributes. Unfortunately, statistics computed from attribute values ciphered might be meaningless (this is the case when the attribute values are intrinsically not ordered in any way).We present an analytical data model, where the information content of a database relation is represented by a contingency table and analysed using the methods of the multivariate information theory. From these quantitative tools of analysis may benefit first the database user interested in a statistical view of the database contents and inclined to put queries like “to what extent are attributes related (in a given database state)?” or like “how does one attribute depend on the others (in a given database state)?”. A second application, here only sketched, is the measurement of the record selectivities for queries, in view of an evaluation of the physical database organization performance.

论文关键词：

论文评审过程：Available online 10 June 2003.

论文官网地址：https://doi.org/10.1016/0306-4379(86)90029-3