Iterative column subset selection

作者:Bruno Ordozgoiti, Sandra Gómez Canaval, Alberto Mozo

摘要

Dimensionality reduction is often a crucial step for the successful application of machine learning and data mining methods. One way to achieve said reduction is feature selection. Due to the impossibility of labelling many data sets, unsupervised approaches are frequently the only option. The column subset selection problem translates naturally to this purpose and has received considerable attention over the last few years, as it provides simple linear models for low-rank data reconstruction. Recently, it was empirically shown that an iterative algorithm, which can be implemented efficiently, provides better subsets than other state-of-the-art methods. In this paper, we describe this algorithm and provide a more in-depth analysis. We carry out numerous experiments to gain insights on its behaviour and derive a simple bound for the norm recovered by the resulting matrix. To the best of our knowledge, this is the first theoretical result of this kind for this algorithm.

论文关键词:Column subset selection, Unsupervised feature selection, Dimensionality reduction, Machine learning, Data mining

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-017-1115-4