Feature selection for noisy variation patterns using kernel principal component analysis

作者:

Highlights:

摘要

Kernel Principal Component Analysis (KPCA) is a technique widely used to understand and visualize non-linear variation patterns by inverse mapping the projected data from a high-dimensional feature space back to the original input space. Variation patterns often occur in a small number of relevant features out of the overall set of features that are recorded in the data. It is, therefore, crucial to discern this set of relevant features that define the pattern. Here we propose a feature selection procedure that augments KPCA to obtain importance estimates of the features given the noisy training data. Our feature selection strategy involves projecting the data points onto sparse random vectors for calculating the kernel matrix. We then match pairs of such projections, and determine the preimages of the data with and without a feature, thereby trying to identify the importance of that feature. Thus, preimages’ differences within pairs are used to identify the relevant features. An advantage of our method is it can be used with any suitable KPCA algorithm. Moreover, the computations can be parallelized easily leading to significant speedup. We demonstrate our method on several simulated and real data sets, and compare the results to alternative approaches in the literature.

论文关键词:Nonlinear PCA,Kernel feature space,Preimages,Variation patterns,Feature ensembles

论文评审过程:Received 12 February 2014, Revised 5 August 2014, Accepted 29 August 2014, Available online 16 September 2014.

论文官网地址:https://doi.org/10.1016/j.knosys.2014.08.027