A generalized framework for anaphora resolution in Indian languages

作者:

Highlights:

摘要

In this paper, we propose a joint model of feature selection and ensemble learning for anaphora resolution in the resource-poor environment like the Indian languages. The proposed approach is based on multi-objective differential evolution (DE) that optimises five coreference resolution scorers, namely Muc, Bcub, Ceafm, Ceafe and Blanc. The main goal is to determine the best combination of different mention classifiers and the most relevant set of features for anaphora resolution. The proposed method is evaluated for three leading Indian languages, namely Hindi, Bengali and Tamil. Experiments on the benchmark datasets of ICON-2011 Shared Task on Anaphora Resolution in Indian Languages show that our proposed approach attains good level of accuracies, which are often better with respect to the state-of-the-art systems. It achieves the F-measure values of 71.89%, 59.61%, 52.55% 34.45% and 72.52% for Muc, Bcub, Ceafm, Ceafe and Blanc, respectively, for Bengali language. For Hindi we obtain the F-measure values of 33.27%, 63.06%, 49.59%, 49.06% and 55.45% for Muc, Bcub, Ceafm, Ceafe and Blanc metrics, respectively. In order to further show the efficacy of our proposed algorithm, we evaluate with Tamil, a language that belongs to a different family. This shows the F-measure values of 31.79%, 64.67%, 46.81%, 45.29% and 52.80% for Muc, Bcub, Ceafm, Ceafe and Blanc metrics, respectively. Experiments on Dutch show the F-measure values of 17.67%, 74.43%, 58.08%, 59.21% and 55.58% for Muc, Bcub, Ceafm, Ceafe and Blanc metrics, respectively.

论文关键词:Multiobjective optimization (MOO),Single objective optimization (SOO),Conditional random field (CRF),Support vector machine (SVM)

论文评审过程:Received 16 May 2015, Revised 27 June 2016, Accepted 28 June 2016, Available online 6 July 2016, Version of Record 3 September 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.06.033