Mutual information-based label distribution feature selection for multi-label learning

作者：

Highlights：

•

摘要

Feature selection used for dimensionality reduction of the feature space plays an important role in multi-label learning where high-dimensional data are involved. Although most existing multi-label feature selection approaches can deal with the problem of label ambiguity which mainly focuses on the assumption of uniform distribution with logical labels, it cannot be applied to many practical applications where the significance of related label for every instance tends to be different. To deal with this issue, in this study, label distribution learning covered with a certain real number of labels is introduced to design a model for the labeling-significance. Nevertheless, multi-label feature selection is limited to handling only labels consisting of logical relations. In order to solve this problem, combining the random variable distribution with granular computing, we first propose a label enhancement algorithm to transform logical labels in multi-label data into label distribution with more supervised information, which can mine the hidden label significance from every instance. On this basis, to remove some redundant or irrelevant features in multi-label data, a label distribution feature selection algorithm using mutual information and label enhancement is developed. Finally, the experimental results show that the performance of the proposed method is superior to the other state-of-the-art approaches when dealing with multi-label data.

论文关键词：Feature selection,Multi-label data,Granular computing,Label enhancement,Mutual information

论文评审过程：Received 30 September 2019, Revised 18 February 2020, Accepted 20 February 2020, Available online 25 February 2020, Version of Record 4 April 2020.

论文官网地址：https://doi.org/10.1016/j.knosys.2020.105684