Short text clustering by finding core terms

作者:Xingliang Ni, Xiaojun Quan, Zhi Lu, Liu Wenyin, Bei Hua

摘要

A new clustering strategy, TermCut, is presented to cluster short text snippets by finding core terms in the corpus. We model the collection of short text snippets as a graph in which each vertex represents a piece of short text snippet and each weighted edge between two vertices measures the relationship between the two vertices. TermCut is then applied to recursively select a core term and bisect the graph such that the short text snippets in one part of the graph contain the term, whereas those snippets in the other part do not. We apply the proposed method on different types of short text snippets, including questions and search results. Experimental results show that the proposed method outperforms state-of-the-art clustering algorithms for clustering short text snippets.

论文关键词:Clustering, Short text clustering, TermCut

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-010-0299-7