Learning finite-state models for machine translation

作者:Francisco Casacuberta, Enrique Vidal

摘要

In formal language theory, finite-state transducers are well-know models for simple “input-output” mappings between two languages. Even if more powerful, recursive models can be used to account for more complex mappings, it has been argued that the input-output relations underlying most usual natural language pairs can essentially be modeled by finite-state devices. Moreover, the relative simplicity of these mappings has recently led to the development of techniques for learning finite-state transducers from a training set of input-output sentence pairs of the languages considered. In the last years, these techniques have lead to the development of a number of machine translation systems. Under the statistical statement of machine translation, we overview here how modeling, learning and search problems can be solved by using stochastic finite-state transducers. We also review the results achieved by the systems we have developed under this paradigm. As a main conclusion of this review we argue that, as task complexity and training data scarcity increase, those systems which rely more on statistical techniques tend produce the best results.

论文关键词:Machine translation, Stochastic finite-state transducers, Grammatical inference

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10994-006-9612-9