Rearranging data to maximize the efficiency of compression

作者:

Highlights:

摘要

We are concerned with rearranging the order in which data is stored in a database so as to maximize the amount of compression. We consider multi-attribute data defined over finite discrete domains (called categories) and seek an optimal permutation of the categories for each attribute. We study both deterministic and probabilistic models of data distribution. We show that the deterministic category rearrangement problem is NP-complete (for run length encoding compression methods) via a reduction to the rectilinear traveling salesman problem. For the probabilistic model, we show that the optimal category rearrangement for each attribute has the form of a double pipe organ, which is a generalized version of the well-known pipe organ arrangement found in earlier work on record placement on disk cylinders. For k-dimensional data we obtain an algorithm for finding the optimal arrangement which is O(n2) for fixed k.

论文关键词:

论文评审过程:Received 11 September 1986, Revised 30 July 1987, Available online 2 December 2003.

论文官网地址:https://doi.org/10.1016/0022-0000(89)90009-3