Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation

作者:Anis Zouaghi, Laroussi Merhbene, Mounir Zrigui

摘要

In this paper, we propose to use Harman, Croft and Okapi measures with Lesk algorithm to develop a system for Arabic word sense disambiguation, that combines unsupervised and knowledge based methods. This system must solve the lexical semantic ambiguity in Arabic language. The information retrieval measures are used to estimate the most relevant sense of the ambiguous word, by returning a semantic coherence score corresponding to the context that is semantically closest to the original sentence containing the ambiguous word. The Lesk algorithm is used to assign and select the adequate sense from those proposed by the information retrieval measures mentioned above. This selection is based on a comparison between the glosses of the word to be disambiguated, and its different contexts of use extracted from a corpus. Our experimental study proves that using of Lesk algorithm with Harman, Croft, and Okapi measures allows us to obtain an accuracy rate of 73%.

论文关键词:Arabic word sense disambiguation (AWSD), Unsupervised and incremental approach, Knowledge based approach, Information retrieval methods, Lesk algorithm

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10462-011-9249-3