Co-clustering over multiple dynamic data streams based on non-negative matrix factorization

作者:Chun-Yan Sang, Di-Hua Sun

摘要

Clustering multiple data streams has become an active area of research with many practical applications. Most of the early work in this area focused on one-sided clustering, i.e., clustering data streams based on feature correlation. However, recent research has shown that data streams can be grouped based on the distribution of their features, while features can be grouped based on their distribution across data streams. In this paper, an evolutionary clustering algorithm is proposed for multiple data streams using graph regularization non-negative matrix factorization (EC-NMF) in which the geometric structure of both the data and feature manifold is considered. Instead of directly clustering multiple data streams periodically, EC-NMF works in the low-rank approximation subspace and incorporates prior knowledge from historic results with temporal smoothness. Furthermore, we develop an iterative algorithm and provide convergence and correctness proofs from a theoretical standpoint. The effectiveness and efficiency of the algorithm are both demonstrated in experiments on real and synthetic data sets. The results show that the proposed EC-NMF algorithm outperforms existing methods for clustering multiple data streams evolving over time.

论文关键词:Low-rank approximation, Non-negative matrix factorization, Graph regularization, Co-clustering

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-014-0526-0