Synset expansion on translation graph for automatic wordnet construction
作者:
Highlights:
•
摘要
Research on clustering algorithms in synonymy graphs of a single language yields promising results, however, this idea is not yet explored in a multilingual setting. Nevertheless, moving the problem to a multilingual translation graph enables the use of more clues and techniques not possible in a monolingual synonymy graph. This article explores the potential of sense induction methods in a massively multilingual translation graph. For this purpose, the performance of graph clustering methods in synset detection are investigated. In the context of translation graphs, the use of existing Wordnets in different languages is an important clue for synset detection which cannot be utilized in a monolingual setting. Casting the problem into an unsupervised synset expansion task rather than a clustering or community detection task improves the results substantially. Furthermore, instead of a greedy unsupervised expansion algorithm guided by heuristics, we devise a supervised learning algorithm able to learn synset expansion patterns from the words in existing Wordnets to achieve superior results. As the training data is formed of already existing Wordnets, as opposed to previous work, manual labeling is not required. To evaluate our methods, Wordnets for Slovenian, Persian, German and Russian are built from scratch and compared to their manually built Wordnets or labeled test-sets. Results reveal a clear improvement over 2 state-of-the-art algorithms targeting massively multilingual Wordnets and competitive results with Wordnet construction methods targeting a single language. The system is able to produce Wordnets from scratch with a Wordnet base concept coverage ranging from 20% to 88% for 51 languages and expands existing Wordnets up to 30%.
论文关键词:Automated wordnet construction,Translation graph,Sense induction,Synset detection
论文评审过程:Received 17 February 2018, Revised 30 September 2018, Accepted 5 October 2018, Available online 16 October 2018, Version of Record 16 October 2018.
论文官网地址:https://doi.org/10.1016/j.ipm.2018.10.002