Cross-lingual document similarity estimation and dictionary generation with comparable corpora

作者:Tadej Štajner, Dunja Mladenić

摘要

This paper proposes an approach for performing bilingual dictionary generation even when trained on widely available comparable bilingual corpora. We also show its capability to provide cross-lingual similarity estimates that correlate well with human judgments. We implement an approach using a nonlinear bilingual translation model that we train using comparable corpora. We propose a method using word embeddings and kernel approximation to train scalable nonlinear transformations. We demonstrate that this novel method works better on a majority of evaluated language pairs.

论文关键词:Cross-lingual text analysis, Vector space machine translation, Representation learning, Comparable corpora, Similarity learning, Dictionary generation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-018-1179-9