Evaluation of analogical proportions through Kolmogorov complexity

作者:

Highlights:

摘要

In this paper, we try to identify analogical proportions, i.e., statements of the form “a is to b as c is to d”, expressed in linguistic terms. While it is conceivable to use an algebraic model for testing proportions such as “2 is to 4 as 5 is to 10”, or even such as “read is to reader as lecture is to lecturer”, there is no algebraic framework to support statements such as “engine is to car as heart is to human” or “wine is to France as beer is to England”, helping to recognize them as meaningful analogical proportions. The idea is then to rely on text corpora, or even on the Web itself, where one may expect to find the pragmatics and the semantics of the words, in their common use. In that context, in order to attach a numerical value to the “analogical ratio” corresponding to the phrase “a is to b”, we start from the works of Kolmogorov on complexity theory. This is the basis for a universal measure of the information content of a word a, or of a word a with respect to another one b, which, in practice, is estimated in a statistical manner. We investigate the link between a purely logical, recently introduced view of analogical proportions and its counterpart based on Kolmogorov theory. The criteria proposed for testing candidate proportions fit with the expected properties (symmetry, central permutation) of analogical proportions. This leads to a new computational method to define, and ultimately to try to detect, analogical proportions in natural language. Experiments with classifiers based on these ideas are reported, and results are rather encouraging with respect to the recognition of common sense linguistic analogies. The approach is also compared with existing works on similar problems.

论文关键词:Analogical proportion,Kolmogorov complexity,Common sense analogies,Search engine,Google

论文评审过程:Available online 18 July 2011.

论文官网地址:https://doi.org/10.1016/j.knosys.2011.06.022