A large reproducible benchmark of ontology-based methods and word embeddings for word similarity
作者:
Highlights:
• A reproducible benchmark of ontology-based similarity measures and word embeddings.
• The largest known set of reproducible experiments on word similarity and relatedness.
• A detailed reproducibility protocol based on the HESML library and a self-contained dataset.
• Our experiment file can be used as a template to evaluate other experimental setups.
• Providing a long-time reproducibility protocol based on the Reprozip tool.
摘要
•A reproducible benchmark of ontology-based similarity measures and word embeddings.•The largest known set of reproducible experiments on word similarity and relatedness.•A detailed reproducibility protocol based on the HESML library and a self-contained dataset.•Our experiment file can be used as a template to evaluate other experimental setups.•Providing a long-time reproducibility protocol based on the Reprozip tool.
论文关键词:Ontology-based semantic similarity measures,Word embeddings,Information Content models,Reproducible benchmark,HESML,Reprozip
论文评审过程:Received 1 November 2019, Revised 3 September 2020, Accepted 5 September 2020, Available online 30 September 2020, Version of Record 14 October 2020.
论文官网地址:https://doi.org/10.1016/j.is.2020.101636