Semi-supervised fuzzy co-clustering algorithm for document categorization

作者:Yang Yan, Lihui Chen, William-Chandra Tjhi

摘要

In this paper, we propose a new semi-supervised fuzzy co-clustering algorithm called SS-FCC for categorization of large web documents. In this new approach, the clustering process is carried out by incorporating some prior domain knowledge of a dataset in the form of pairwise constraints provided by users into the fuzzy co-clustering framework. With the help of those constraints, the clustering problem is formulated as the problem of maximizing a competitive agglomeration cost function with fuzzy terms, taking into account the provided domain knowledge. The constraint specifies whether a pair of objects “must” or “cannot” be clustered together. The update rules for fuzzy memberships are derived, and an iterative algorithm is designed for the soft co-clustering process. Our experimental studies show that the quality of clustering results can be improved significantly with the proposed approach. Simulations on 10 large benchmark datasets demonstrate the strength and potentials of SS-FCC in terms of performance evaluation criteria, stability and operating time, compared with some of the existing semi-supervised algorithms.

论文关键词:Semi-supervised clustering, Fuzzy co-clustering, Must-link/ Cannot-link constraint, Document categorization

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-011-0454-9