Constrained neighborhood preserving concept factorization for data representation

作者:

Highlights:

摘要

Matrix factorization based techniques, such as nonnegative matrix factorization (NMF) and concept factorization (CF), have attracted a great deal of attentions in recent years, mainly due to their ability of dimension reduction and sparse data representation. Both techniques are of unsupervised nature and thus do not make use of a priori knowledge to guide the clustering process. This could lead to inferior performance in some scenarios. As a remedy to this, a semi-supervised learning method called Pairwise Constrained Concept Factorization (PCCF) was introduced to incorporate some pairwise constraints into the CF framework. Despite its improved performance, PCCF uses only a priori knowledge and neglects the proximity information of the whole data distribution; this could lead to rather poor performance (although slightly improved comparing to CF) when only limited a priori information is available. To address this issue, we propose in this paper a novel method called Constrained Neighborhood Preserving Concept Factorization (CNPCF). CNPCF utilizes both a priori knowledge and local geometric structure of the dataset to guide its clustering. Experimental studies on three real-world clustering tasks demonstrate that our method yields a better data representation and achieves much improved clustering performance in terms of accuracy and mutual information comparing to the state-of-the-arts techniques.

论文关键词:Concept factorization,Locally consistent concept factorization,Semi-supervised document clustering,Neighborhood preserving,Data representation

论文评审过程:Received 27 September 2015, Revised 24 March 2016, Accepted 3 April 2016, Available online 6 April 2016, Version of Record 23 April 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.04.003