Slavic languages in phrase-based statistical machine translation: a survey

作者:Mirjam Sepesy Maučec, Janez Brest

摘要

The demand for translations is increasing at a rate far beyond the capacity of professional translators. It is too difficult, time consuming and expensive to translate everything from scratch in each language. Machine translation offers a solution, as it provides translation automatically. Until recently, statistical machine translation has proved to be one of the most successful approaches. However, a new approach to machine translation based on neural networks has emerged with promising results. The present paper concerns phrase-based statistical machine translation, an area that has been extensively studied in the literature. The translation system consists of many components built on the premise of probabilities. Each component is described separately. Although high quality translation systems have been developed for certain language pairs, there is still a large number of languages that cause many translation errors. Languages with a rich morphology pose an especially difficult challenge for research. We address one group of morphologically rich languages: Slavic languages, which constitute a relatively homogeneous family of languages characterized by rich, inflectional morphology. The present paper offers a comprehensive survey of approaches to coping with Slavic languages in different aspects of statistical machine translation. We observe that the interest of the community in research of more difficult languages is increasing and we believe that the translation quality of those languages will reach the level of practical use in the near future.

论文关键词:Statistical machine translation, Morphology, Slavic language, Inflection, Free word order

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10462-017-9558-2