Robust unsupervised feature selection via dual self-representation and manifold regularization

作者:

Highlights:

摘要

Unsupervised feature selection has become an important and challenging pre-processing step in machine learning and data mining since large amount of unlabelled high dimensional data are often required to be processed. In this paper, we propose an efficient method for robust unsupervised feature selection via dual self-representation and manifold regularization, referred to as DSRMR briefly. On the one hand, a feature self-representation term is used to learn the feature representation coefficient matrix to measure the importance of different feature dimensions. On the other hand, a sample self-representation term is used to automatically learn the sample similarity graph to preserve the local geometrical structure of data which has been verified critical in unsupervised feature selection. By using l2,1-norm to regularize the feature representation residual matrix and representation coefficient matrix, our method is robustness to outliers, and the row sparsity of the feature coefficient matrix induced by l2,1-norm can effectively select representative features. During the optimization process, the feature coefficient matrix and sample similarity graph constrain each other to obtain optimal solution. Experimental results on ten real-world data sets demonstrate that the proposed method can effectively identify important features, outperforming many state-of-the-art unsupervised feature selection methods in terms of clustering accuracy (ACC) and normalized mutual information (NMI).

论文关键词:Unsupervised feature selection,Local geometric structure,Similarity preservation,Self-representation,Graph learning,00-01,99-00

论文评审过程:Received 22 August 2017, Revised 25 December 2017, Accepted 4 January 2018, Available online 5 January 2018, Version of Record 20 February 2018.

论文官网地址:https://doi.org/10.1016/j.knosys.2018.01.009