3-3FS: ensemble method for semi-supervised multi-label feature selection

摘要

Feature selection has received considerable attention over the past decade. However, it is continuously challenged by new emerging issues. Semi-supervised multi-label learning is one of these promising novel approaches. In this work, we refer to it as an approach that combines data consisting of a huge amount of unlabeled instances with a small number of multi-labeled instances. Semi-supervised multi-label feature selection, like conventional feature selection algorithms, has a rather poor record as regards stability (i.e. robustness with respect to changes in data). To address this weakness and improve the robustness of the feature selection process in high-dimensional data, this document develops an ensemble methodology based on a 3-way resampling of data: (1) Bagging, (2) a random subspace method (RSM) and (3) an additional random sub-labeling strategy (RSL). The proposed framework contributes to enhancing the stability of feature selection algorithms and to improving their performance. Our research findings illustrate that bagging and RSM help improve the stability of the feature selection process and increase learning accuracy, while RSL addresses label correlation, which is a major concern with multi-label data. The paper presents the key findings of a series of experiments, which we conducted on selected benchmark data sets in the classification task. Results are promising, highlighting that the proposed method either outperforms state-of-the-art algorithms or produces at least comparable results.