Cost-sensitive hierarchical classification via multi-scale information entropy for data with an imbalanced distribution

作者:Weijie Zheng, Hong Zhao

摘要

Imbalanced distributions present a great problem in machine learning classification tasks. Various algorithms based on cost-sensitive learning have been developed to address the imbalanced distribution problem. However, classes with a hierarchical tree structure create a new challenge for cost-sensitive learning. In this paper, we propose a cost-sensitive hierarchical classification method based on multi-scale information entropy. We construct an information entropy threshold for each level in the tree structure and assign cost-sensitive weights accordingly. First, we use the class hierarchy to divide a large hierarchical classification problem into several smaller sub-classification problems. In this way, a large-scale classification task can be decomposed into multiple, controllable, small-scale classification tasks. Second, we use a logistic regression algorithm to obtain the probabilities of classes at each level. Then, we consider the information entropy at each level as a threshold, which decreases inter-level error propagation in the tree structure. Finally, we design a cost-sensitive model based on the information of each class and use hierarchical information entropy weights as cost-sensitive weights. Information entropy measures the information of the majority and minority classes and allocates them different cost weights to solve imbalanced distribution problems. Experiments on four imbalanced distribution datasets demonstrate that the cost-sensitive hierarchical classification algorithm provides excellent efficiency and effectiveness.

论文关键词:Imbalanced distribution, Information entropy, Cost-sensitive, Hierarchical classification

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-020-02089-1