Soft memberships for spectral clustering, with application to permeable language distinction

作者:

Highlights:

摘要

Recently, a large amount of work has been devoted to the study of spectral clustering—a powerful unsupervised classification method. This paper brings contributions to both its foundations, and its applications to text classification. Departing from the mainstream, concerned with hard membership, we study the extension of spectral clustering to soft membership (probabilistic, EM style) assignments. One of its key features is to avoid the complexity gap of hard membership. We apply this theory to a challenging problem, text clustering for languages having permeable borders, via a novel construction of Markov chains from corpora. Experiments with a readily available code clearly display the potential of the method, which brings a visually appealing soft distinction of languages that may define altogether a whole corpus.

论文关键词:Spectral clustering,Soft membership,Stochastic processes,Text classification

论文评审过程:Received 19 September 2007, Revised 10 April 2008, Accepted 24 June 2008, Available online 3 July 2008.

论文官网地址:https://doi.org/10.1016/j.patcog.2008.06.024