Possibilistic fuzzy co-clustering of large document collections

作者:

Highlights:

摘要

In this paper we propose a new co-clustering algorithm called possibilistic fuzzy co-clustering (PFCC) for automatic categorization of large document collections. PFCC integrates a possibilistic document clustering technique and a combined formulation of fuzzy word ranking and partitioning into a fast iterative co-clustering procedure. This novel framework brings about simultaneously some benefits including robustness in the presence of document and word outliers, rich representations of co-clusters, highly descriptive document clusters, a good performance in a high-dimensional space, and a reduced sensitivity to the initialization in the possibilistic clustering. We present the detailed formulation of PFCC together with the explanations of the motivations behind. The advantages over other existing works and the algorithm's proof of convergence are provided. Experiments on several large document data sets demonstrate the effectiveness of PFCC.

论文关键词:Co-clustering,Possibilistic clustering,Fuzzy clustering,Document clustering,Text mining,Information retrieval

论文评审过程:Received 19 September 2006, Revised 11 April 2007, Accepted 21 April 2007, Available online 13 May 2007.

论文官网地址:https://doi.org/10.1016/j.patcog.2007.04.017