Multi-label collective classification via Markov chain based learning method

作者:

Highlights:

摘要

In this paper, we study the problem of multi-label collective classification (MLCC) where instances are related and associated with multiple class labels. Such correlation of class labels among interrelated instances exists in a wide variety of data, e.g., a web page can belong to multiple categories since its semantics can be recognized in different ways, and the linked web pages are more likely to have the same classes than the unlinked pages. We propose an effective and novel Markov chain based learning method for MLCC problems. Our idea is to model the problem as a Markov chain with restart on transition probability graphs, and to propagate the ranking score of labeled instances to unlabeled instances based on the affinity among instances. The affinity among instances is set up by explicitly using the attribute features derived from the content of instances as well as the correlation features constructed from the links of instances. Intuitively, an instance which contains linked neighbors that are highly similar to the other instances with a high rank of a particular class label, has a high chance of this class label. Extensive experiments have been conducted on two DBLP datasets to demonstrate the effectiveness of the proposed algorithm. The performance of the proposed algorithm is shown to be better than those of the binary relevance multi-label algorithm, collective classification algorithms (wvRN, ICA and Gibbs), and the ICML algorithm for the tested MLCC problems.

论文关键词:Machine learning,Multi-label collective classification,Multi-label learning,Collective classification,Markov chain with restart

论文评审过程:Received 23 January 2013, Revised 18 December 2013, Accepted 17 February 2014, Available online 29 March 2014.

论文官网地址:https://doi.org/10.1016/j.knosys.2014.02.012