Antonyms are similar: Towards paradigmatic association approach to rating similarity in SimLex-999 and WordSim-353

作者:

Highlights:

摘要

SimLex-999 is a widely used lexical resource for tracking progress in word similarity computation. It anchors similarity in synonymy, while other researchers such as Agirre et al. (2009) adopt broader similarity definition, involving also hyponymy and antonymy relations. Paradigmatic association covers synonymy, antonymy and co-hyponymy relations (Lapesa et al., 2014) largely overlapping with this broader similarity definition. Two words are paradigmatically associated if they can replace one another without affecting the grammaticality or acceptability of the sentence. Paradigmatic association can be elicited by asking for word interchangeability, which we hypothesize might be more natural than instructing raters with a list of relations to consider. To validate the proposed approach, we reannotated WordSim353 and SimLex-999 using two new guidelines: one explicitly qualifying antonymy as a similarity relation, the second one eliciting word interchangeability. As additional datasets we present a crowdsourced version of WordSim353 and a Czech version of SimLex-999. The paper also includes detailed analysis of lexical content of SimLex-999 and benchmark of thesaurus-based and distributional algorithms on multiple word similarity and relatedness datasets.

论文关键词:Word similarity,Word relatedness,WordSim353,SimLex-999

论文评审过程:Received 14 March 2017, Revised 23 November 2017, Accepted 18 March 2018, Available online 11 April 2018, Version of Record 4 June 2018.

论文官网地址:https://doi.org/10.1016/j.datak.2018.03.004