A large reproducible benchmark of ontology-based methods and word embeddings for word similarity

作者：

Highlights：

• A reproducible benchmark of ontology-based similarity measures and word embeddings.

• The largest known set of reproducible experiments on word similarity and relatedness.

• A detailed reproducibility protocol based on the HESML library and a self-contained dataset.

• Our experiment file can be used as a template to evaluate other experimental setups.

• Providing a long-time reproducibility protocol based on the Reprozip tool.

摘要

•A reproducible benchmark of ontology-based similarity measures and word embeddings.•The largest known set of reproducible experiments on word similarity and relatedness.•A detailed reproducibility protocol based on the HESML library and a self-contained dataset.•Our experiment file can be used as a template to evaluate other experimental setups.•Providing a long-time reproducibility protocol based on the Reprozip tool.

论文关键词：Ontology-based semantic similarity measures,Word embeddings,Information Content models,Reproducible benchmark,HESML,Reprozip

论文评审过程：Received 1 November 2019, Revised 3 September 2020, Accepted 5 September 2020, Available online 30 September 2020, Version of Record 14 October 2020.

论文官网地址：https://doi.org/10.1016/j.is.2020.101636