Ensemble dimension reduction based on spectral disturbance for subspace clustering

作者:

Highlights:

摘要

The feature distribution of high dimension, small sample size (HDSS) data is sparse, resulting in unsatisfactory clustering results. Dimension reduction methods play an inevitable role in analyzing and visualizing high-dimensional data. It is likely to cause the matrix singularity for subspace clustering when directly reduce the dimension of HDSS dataset. Therefore, we construct multiple data subsets from the original HDSS dataset for ensemble dimension reduction. Projection least square regression subspace clustering (PLSR) which combines projection technique with least-square regression is used as a base dimension reducer for ensemble dimension reduction, called EPLSR. Considering the spectral properties of spectral clustering, we propose the ensemble dimension reduction for subspace clustering based on spectral disturbance (SD-EPLSR) method. According to the theory of spectral disturbance, the weight coefficients are learned according to two principles: 1. The clustering results on each data subset should be close to the consensus clustering result. 2. Data subsets with similar clustering results should have approximate weights. Experiments on eight HDSS datasets demonstrate that our method is effective.

论文关键词:Projection subspace clustering,Ensemble dimension reduction,High dimension,Small sample size data,Spectral disturbance

论文评审过程:Received 19 October 2020, Revised 25 May 2021, Accepted 26 May 2021, Available online 7 June 2021, Version of Record 12 June 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107182