Unsupervised soft-label feature selection

作者:

Highlights:

摘要

Unsupervised feature selection is an important task in various research fields. It is difficult to select the discriminative features under unsupervised scenario due to the absence of label guidance. Recent works employ the pseudo labels to guide feature selection. However, they generate pseudo labels from the original feature space, where noises, redundancies and outliers may degrade the quality of pseudo labels. Besides, they ignore data fuzziness and use hard-labels as the semantic supervision of feature selection, thus the selected features suffer from significant information loss and semantic shortage. To tackle these problems, we propose an effective Unsupervised Soft-label Feature Selection (USFS) model, which performs soft-label learning and simultaneously guides the unsupervised feature selection process with the learned soft-labels. Specifically, we transform the data to low-dimensional subspace where the affinity matrix with sparse constraint is learned based on the local distances. The affinity matrix is determined as the soft-label matrix and further employed to guide the ultimate feature selection process. A simple yet efficient optimization method is derived to iteratively solve the formulated problem. Promising experimental results on widely tested benchmarks demonstrate the superiority of the proposed method compared with state-of-the-art approaches. For the purpose of reproducibility, we provide the code and testing datasets at https://github.com/wang-feifei/USFS-code.

论文关键词:Unsupervised feature selection,Dimension reduction,Fuzziness,Soft-label

论文评审过程:Received 16 August 2020, Revised 20 January 2021, Accepted 23 January 2021, Available online 16 February 2021, Version of Record 1 March 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.106847