Annotation cost-sensitive active learning by tree sampling

作者:Yu-Lin Tsou, Hsuan-Tien Lin

摘要

Active learning is an important machine learning setup for reducing the labelling effort of humans. Although most existing works are based on a simple assumption that each labelling query has the same annotation cost, the assumption may not be realistic. That is, the annotation costs may actually vary between data instances. In addition, the costs may be unknown before making the query. Traditional active learning algorithms cannot deal with such a realistic scenario. In this work, we study annotation cost-sensitive active learning algorithms, which need to estimate the utility and cost of each query simultaneously. We propose a novel algorithm, the cost-sensitive tree sampling algorithm, that conducts the two estimation tasks together and solve it with a tree-structured model motivated from hierarchical sampling, a famous algorithm for traditional active learning. Extensive experimental results using datasets with simulated and true annotation costs validate that the proposed method is generally superior to other annotation cost-sensitive algorithms.

论文关键词:Annotation cost-sensitive, Active learning, Clustering, Decision tree

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10994-019-05781-7