Clustering of semantically enriched short texts
作者:Marek Kozlowski, Henryk Rybinski
摘要
The paper is devoted to the issue of clustering small sets of very short texts. Such texts are often incomplete and highly inconclusive, so establishing a notion of proximity between them is a challenging task. In order to cope with polysemy we adapt the SenseSearcher algorithm (SnS), by Kozlowski and Rybinski in Computational Intelligence 33(3): 335–367, 2017b. In addition, we test the possibilities of improving the quality of clustering ultra-short texts by means of enriching them semantically. We present two approaches, one based on neural-based distributional models, and the other based on external knowledge resources. The approaches are tested on SnSRC and other knowledge-poor algorithms.
论文关键词:Document clustering, Information retrieval, Semantic enrichment
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10844-018-0541-4