Compression of large inverted files with hyperbolic term distribution

作者:

Highlights:

摘要

The storage requirements for retrieval systems utilizing inverted files are calculated assuming different storage modes. Various methods for compression of these large files are analyzed. Binary vectors compressed by run-length coding as well as lists of document numbers were found to be suitable. The problem of minimal storage requirements for the inverted file is solved for different assumptions about index term distributions. A representation combining run-length coded binary vectors with list of document numbers was found to be the most economical. Parameter values for this minimum storage form are calculated and specified in tables as well as displayed graphically.

论文关键词:

论文评审过程:Available online 17 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(76)90035-2