Efficient retrieval of partial documents

作者:

Highlights:

摘要

Management and retrieval of large volumes of text can be expensive in both space and time. Moreover, the range of document sizes in a large collection such as TREC presents difficulties for both the retrieval mechanism and the user. We consider division of documents into parts as a solution to the problem of the range of document sizes, and show that, for databases with long documents, use of document parts can improve the quality of the information presented to the user. We also describe the compressed text database system we use to manage the TREC collection; the compressed inverted files with which it is indexed; and the techniques we use to evaluate the TREC queries, both on whole documents and on document parts.

论文关键词:

论文评审过程:Available online 21 February 2000.

论文官网地址:https://doi.org/10.1016/0306-4573(94)00052-5