Unsupervised feature selection with multi-subspace randomization and collaboration
作者:
Highlights:
•
摘要
Unsupervised feature selection has been an important technique in high-dimensional data analysis. Despite significant success, most of the existing unsupervised feature selection methods tend to estimate the underlying structure of data in the original feature space, but lack the ability to explore various subspaces in the high-dimensional space. In this paper, we argue that the use of a large number of random subspaces can significantly benefit the unsupervised feature selection accuracy. Particularly, we propose a new unsupervised feature selection approach based on multi-subspace randomization and collaboration. A balanced subspace randomization scheme is first presented to produce multiple basic feature partitions with similar-sized random subspaces. Then, multiple -nearest neighbor graphs are constructed in these subspaces, based on which the Laplacian scores of the features in each subspace are obtained w.r.t. their locality preserving power. Thereafter, the obtained feature score vectors of different subspaces in different basic feature partitions are integrated into a full score vector of all features, which takes into account the structure information of various subspaces and can significantly enhance the performance of unsupervised feature selection. Experiments conducted on twenty high-dimensional datasets have demonstrated the high efficiency and robustness of our approach. The MATLAB source code is available at https://www.researchgate.net/publication/334520672.
论文关键词:Unsupervised feature selection,High-dimensional data,Multi-subspace randomization and collaboration,Ensemble learning
论文评审过程:Received 7 February 2019, Revised 13 July 2019, Accepted 17 July 2019, Available online 25 July 2019, Version of Record 9 September 2019.
论文官网地址:https://doi.org/10.1016/j.knosys.2019.07.027