Query-dependent learning to rank for cross-lingual information retrieval

摘要

Learning to rank (LTR), as a machine learning technique for ranking tasks, has become one of the most popular research topics in the area of information retrieval (IR). Cross-lingual information retrieval (CLIR), in which the language of the query is different from the language of the documents, is one of the important IR tasks that can potentially benefit from LTR. Our focus in this paper is the use of LTR for CLIR. To rank the documents in the target language in response to the query in the source language, we propose a local query-dependent approach based on LTR for CLIR, which is called LQ-DLTR for CLIR. The core idea of LQ-DLTR for CLIR is the use of the local characteristics of similar queries to construct the LTR model, instead of using a single global ranking model for all queries. Since the query and the documents are in different languages, the traditional features that are used in LTR cannot be used directly for CLIR. Thus, defining appropriate features is a major step in the use of LTR for CLIR. In this paper, three categories of cross-lingual features are defined: query–document features, document features, and query features. To define the cross-lingual features, translation resources are used to fill the gap between the documents and the queries. Then, in LQ-DLTR for CLIR, a neighborhood of similar queries based on cross-lingual query features is used to create a local ranking function by the LTR algorithm for a given query. The LTR algorithm uses two cross-lingual feature sets, namely document features and query–document features, to learn the model. The query features that are used to identify the neighbors are not involved in the learning phase. Experimental results indicate that the CLIR performance improves with the use of cross-lingual features that use several translations and their probabilities to compute the features, compared to the use of monolingual features in traditional LTR, which translate a query according to the best translation and ignore the probabilities. Moreover, experimental results show that LQ-DLTR for CLIR outperforms the baseline information retrieval methods and other LTR ranking models in terms of the MAP and NDCG measures.