Sorting of textual data bases: A variety generation approach to distribution sorting

作者:

Highlights:

摘要

A method of sorting large textual data-bases by computer using external storage is proposed. The range of sort-keys in a sample of data to be sorted is divided into a fixed set of partitions, which should also give an adequate representation of new data from a similar source. The partitions are composed of ordered key ranges. An incoming data stream is distributed into a series of bins according to the partition in which the key lies, and the bins are then seperately sorted, using an internal sort, to give an ordered file. It is shown how the number of disc accesses needed depends on the manner in which the bins become filled, and thus on statistics of the data. Experiments using an INSPEC data-base give information on which estimates of the efficiency of the method can be based.

论文关键词:

论文评审过程:Received 10 September 1979, Available online 13 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(80)90005-9