Feature selection for hierarchical classification via joint semantic and structural information of labels

作者：

Highlights：

•

摘要

Hierarchical Classification is widely used in many real-world applications, where the label space is exhibited as a tree or a Directed Acyclic Graph (DAG) and each label has rich semantic descriptions. Feature selection, as a type of dimension reduction technique, has proven to be effective in improving the performance of machine learning algorithms. However, many existing feature selection methods cannot be directly applied to hierarchical classification problems since they ignore the hierarchical relations and take no advantage of the semantic information in the label space. In this paper, we propose a novel feature selection framework based on semantic and structural information of labels. First, we transform the label description into a mathematical representation and calculate the similarity score between labels as the semantic regularization. Second, we investigate the hierarchical relations in a tree structure of the label space as the structural regularization. Finally, we impose two regularization terms on a sparse learning based model for feature selection. Additionally, we adapt the proposed model to a DAG case, which makes our method more general and robust in many real-world tasks. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework for hierarchical classification domains.

论文关键词：Feature selection,Hierarchical classification,Label semantic similarity,Label hierarchical structure

论文评审过程：Received 16 August 2019, Revised 9 February 2020, Accepted 11 February 2020, Available online 13 February 2020, Version of Record 4 April 2020.

论文官网地址：https://doi.org/10.1016/j.knosys.2020.105655