Passage-based query refinement: (MultiText experiments for TREC-6)

摘要

The MultiText information retrieval system finds arbitrary passages of text, as opposed to complete documents, that are likely to be relevant to a particular topic. Passage retrieval provides the basis for the relevance ranking, term expansion, interactive user interface, and distributed searching used in the MultiText experiments for TREC-6. The essence of the relevance ranking technique is that shorter passages containing a particular set of terms are more likely to be relevant than longer ones. For the routing task, we used training data to compute an estimate of the probability of relevance as a function of passage length and used this estimate to construct compound queries which were then applied to new data. For the ad hoc task (automatic query formulation), we retrieved passages containing terms from the title and description, and automatically selected words from these passages to expand the set of terms. For the ad hoc task (manual query formulation), we used an interactive user interface that displayed passages and allowed users to judge the relevance of documents based on these passages. A similar, more streamlined, user interface was used for the high precision track, in which the objective was to retrieve ten relevant documents in five minutes. In addition, we participated in the Very Large Collection and Chinese language tracks. Our Very Large Collection experiment achieved high performance using distributed retrieval on a network of low cost workstations, with manually constructed queries and no user interaction. For the Chinese track, we used manually constructed queries.