Discovering cluster evolution patterns with the Cluster Association-aware matrix factorization

作者:Wathsala Anupama Mohotti, Richi Nayak

摘要

Tracking of document collections over time (or across domains) is helpful in several applications such as finding dynamics of terminologies, identifying emerging and evolving trends, and concept drift detection. We propose a novel ‘Cluster Association-aware’ Non-negative Matrix Factorization (NMF)-based method with graph-based visualization to identify the changing dynamics of text clusters over time/domains. NMF is utilized to find similar clusters in the set of clustering solutions. Based on the similarities, four major lifecycle states of clusters, namely birth, split, merge and death, are tracked to discover their emergence, growth, persistence and decay. The novel concepts of ‘cluster associations’ and term frequency-based ‘cluster density’ have been used to improve the quality of evolution patterns. The cluster evolution is visualized using a k-partite graph. Empirical analysis with the text data shows that the proposed method is able to produce accurate and efficient solution as compared to the state-of-the-art methods.

论文关键词:Cluster evolution, Text mining, Matrix factorization

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-021-01561-9