Combining the evidence of multiple query representations for information retrieval

摘要

We report on two studies in the TREC-2 program that investigated the effect on retrieval performance of combination of multiple representations of TREC topics. In one of the projects, five separate Boolean queries for each of the 50 TREC routing topics and 25 of the TREC ad hoc topics were generated by 75 experienced online searchers. Using the INQUERY retrieval system, these queries were both combined into single queries, and used to produce five separate retrieval results for each topic. In the former case, progressive combination of queries led to progressively improving retrieval performance, significantly better than that of single queries, and at least as good as the best individual single-query formulations. In the latter case, data fusion of the ranked lists also led to performance better than that of any single list. In the second project, two automatically produced vector queries and three versions of a manually produced P-norm extended Boolean query for each routing and ad hoc topic were compared and combined. This project investigated six different methods of combination of queries, and the combination of the same queries on different databases. As in the first project, progressive combination led to progressively improving results, with the best results, on average, being achieved by combination through summing of retrieval status values. Both projects found that the best method of combination often led to results that were better than the best performing single query. The combined results from the two projects have also been combined by data fusion. The results of this procedure show that combining evidence from completely different systems also leads to performance improvement.