Detection of semantic errors in Arabic texts

作者:

摘要

Detecting semantic errors in a text is still a challenging area of investigation. A lot of research has been done on lexical and syntactic errors while fewer studies have tackled semantic errors, as they are more difficult to treat. Compared to other languages, Arabic appears to be a special challenge for this problem. Because words are graphically very similar to each other, the risk of getting semantic errors in Arabic texts is bigger. Moreover, there are special cases and unique complexities for this language. This paper deals with the detection of semantic errors in Arabic texts but the approach we have adopted can also be applied for texts in other languages. It combines four contextual methods (using statistics and linguistic information) in order to decide about the semantic validity of a word in a sentence. We chose to implement our approach on a distributed architecture, namely, a Multi Agent System (MAS). The implemented system achieved a precision rate of about 90% and a recall rate of about 83%.

论文关键词:Semantic error,Detection,Statistical method,Linguistic method,Combining methods,Co-occurrence,Collocation,Latent Semantic Analysis (LSA),Multi-Agent System (MAS),Arabic

论文评审过程:Received 11 August 2011, Revised 9 July 2012, Accepted 10 July 2012, Available online 15 July 2012.

论文官网地址:https://doi.org/10.1016/j.artint.2012.07.002