Compression of index term dictionary in an inverted-file-orientated database: Some effective algorithms

作者:

Highlights:

摘要

A new method of index term dictionary compression in an inverted-file-orientated database is discussed. A technique of word coding that generates short fixed-length codes obtained from the index terms themselves by analysis of monogram and bigram statistical distributions is described. Transformation of the index term dictionary into a code dictionary preserves a word-to-word discrimination with a rate of three synonyms per 1300 terms, at compression ratio up to 90% and at low cost in terms of the CPU time expenditure. When applied in computer network environment, it offers substantial savings in communication channel utilization at negligible response time degradation. Experimental data for 26,113 index term dictionary of the New York Times Info Bank available via a computer network are presented.

论文关键词:

论文评审过程:Received 19 February 1986, Revised 2 June 1986, Available online 13 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(86)90100-7