A mathematical formulation of keyword compression for thesauri

作者:

Highlights:

摘要

In this paper we demonstrate a new method for concentrating the set of key-words of a thesaurus. This method is based on a mathematical study that we have carried out into the distribution of characters in a defined natural language.We have built a function f of concentration which generates only a few synonyms. In applying this function to the set of key-words of a thesaurus, we reduce each key-word to four characters without synonymity. (For three characters we have a rate of synonymity of approx. 1/1000th.)A new structure of binary files allows the thesaurus to be contained in a table of less than 700 bytes.

论文关键词:

论文评审过程:Available online 13 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(77)90037-1