Asymptotic analysis of estimators on multi-label data

作者:Andreas P. Streich, Joachim M. Buhmann

摘要

Multi-label classification extends the standard multi-class classification paradigm by dropping the assumption that classes have to be mutually exclusive, i.e., the same data item might belong to more than one class. Multi-label classification has many important applications in e.g. signal processing, medicine, biology and information security, but the analysis and understanding of the inference methods based on data with multiple labels are still underdeveloped. In this paper, we formulate a general generative process for multi-label data, i.e. we associate each label (or class) with a source. To generate multi-label data items, the emissions of all sources in the label set are combined. In the training phase, only the probability distributions of these (single label) sources need to be learned. Inference on multi-label data requires solving an inverse problem, models of the data generation process therefore require additional assumptions to guarantee well-posedness of the inference procedure. Similarly, in the prediction (test) phase, the distributions of all single-label sources in the label set are combined using the combination function to determine the probability of a label set. We formally describe several previously presented inference methods and introduce a novel, general-purpose approach, where the combination function is determined based on the data and/or on a priori knowledge of the data generation mechanism. This framework includes cross-training and new source training (also named label power set method) as special cases. We derive an asymptotic theory for estimators based on multi-label data and investigate the consistency and efficiency of estimators obtained by several state-of-the-art inference techniques. Several experiments confirm these findings and emphasize the importance of a sufficiently complex generative model for real-world applications.

论文关键词:Generative model, Asymptotic analysis, Multi-label classification, Consistency

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10994-014-5457-9