Selecting feature subset with sparsity and low redundancy for unsupervised learning

作者:

Highlights:

摘要

Feature selection techniques are attracting more and more attention with the growing number of domains that produce high dimensional data. Due to the absence of class labels, many researchers focus on the unsupervised scenario, attempting to find an optimal feature subset that preserves the original data distribution. However, the existing methods either fail to achieve sparsity or ignore the potential redundancy among features. In this paper, we propose a novel unsupervised feature selection algorithm, which retains the preserving power, and implements high sparsity and low redundancy in a unified manner. On the one hand, to preserve the data structure of the whole feature set, we build the graph Laplacian matrix and learn the pseudo class labels through spectral analysis. By finding a feature weight matrix, we are allowed to map the original data into a low dimensional space based on the pseudo labels. On the other hand, to ensure the sparsity and low redundancy simultaneously, we introduce a novel regularization term into the objective function with the nonnegative constraints imposed, which can be viewed as the combination of the matrix norms ||·||m1 and ||·||m2 on the weights of features. An iterative multiplicative algorithm is accordingly designed with proved convergence to efficiently solve the constrained optimization problem. Extensive experimental results on different real world data sets demonstrate the promising performance of our proposed method over the state-of-the-arts.

论文关键词:Unsupervised feature selection,Nonnegative spectral analysis,Sparsity and low redundancy

论文评审过程:Received 21 October 2014, Revised 6 June 2015, Accepted 8 June 2015, Available online 16 June 2015, Version of Record 31 July 2015.

论文官网地址:https://doi.org/10.1016/j.knosys.2015.06.008