Extracting translations from comparable corpora for Cross-Language Information Retrieval using the language modeling framework

作者:

Highlights:

• Proposing a language modeling method to extract translations from comparable corpora.

• Comparing two similarity functions for deriving bilingual word correlations.

• Improving translation quality by integrating co-occurrence relations into word models.

• Comparing different estimations of translation probabilities from word correlations.

• Showing the significant impact of probability estimation methods on CLIR performance.

摘要

•Proposing a language modeling method to extract translations from comparable corpora.•Comparing two similarity functions for deriving bilingual word correlations.•Improving translation quality by integrating co-occurrence relations into word models.•Comparing different estimations of translation probabilities from word correlations.•Showing the significant impact of probability estimation methods on CLIR performance.

论文关键词:Translation model,Bilingual lexicon,Comparable corpora,Cross-Language Information Retrieval,Language modeling framework

论文评审过程:Received 13 November 2014, Revised 10 August 2015, Accepted 17 August 2015, Available online 10 November 2015, Version of Record 19 February 2016.

论文官网地址:https://doi.org/10.1016/j.ipm.2015.08.001