Clustering of semantically enriched short texts

作者:Marek Kozlowski, Henryk Rybinski

摘要

The paper is devoted to the issue of clustering small sets of very short texts. Such texts are often incomplete and highly inconclusive, so establishing a notion of proximity between them is a challenging task. In order to cope with polysemy we adapt the SenseSearcher algorithm (SnS), by Kozlowski and Rybinski in Computational Intelligence 33(3): 335–367, 2017b. In addition, we test the possibilities of improving the quality of clustering ultra-short texts by means of enriching them semantically. We present two approaches, one based on neural-based distributional models, and the other based on external knowledge resources. The approaches are tested on SnSRC and other knowledge-poor algorithms.

论文关键词:Document clustering, Information retrieval, Semantic enrichment

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-018-0541-4