A new signature approach for retrieval of documents from free-text databases

作者:

Highlights:

摘要

Among the techniques used for retrieval of information from free-text or document databases, signature methods have proven to be more efficient in terms of storage overhead and processing speed. Signature methods, however, present the problem of “false drops” in which a document is identified but does not satisfy the user query. In the signature approaches such as Word Signature, and Superimposed Coding, the number of false drops is directly related to the hashing function selected, signature size, and number of signature buffers used for each document. Hashing functions also generate collisions, which will result in false drops. In addition, these signature methods do not take into account the length of the words or the positional information of the characters that constitute the word. The use of “Don't Care Characters” in the queries, therefore, is not possible. This paper presents a new signature approach in which the sizes of the signature files are dependent on the number of unique symbols in the alphabet, and therefore for all documents containing English text, the size is constant. The signature generated in this technique maintains the positional information of characters and therefore allows for Don't Care Characters to be used in the queries. Implementation results and comparison of this technique to the Superimposed Coding method is presented.

论文关键词:Signature,False drop,Inversion,Superimposed coding,Full-text

论文评审过程:Received 27 January 1990, Accepted 15 May 1991, Available online 18 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(92)90043-Y