Hierarchically SVM classification based on support vector clustering method and its application to document categorization

作者:

Highlights:

摘要

Automatic categorization of documents into pre-defined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques like support vector machines and related large margin methods have been successfully applied for this task, albeit the fact is that they ignore the inter-class relationships. Unfortunately, in the context of document categorization, we face a large number of classes and a huge number of relevant features needed to distinguish between them. The computational cost of training a classifier for a problem of this size is prohibitive. It has also been observed that obtaining a classifier that discriminates between two groups of classes is much easier than distinguishing simultaneously among all classes. This has prompted substantial research in using hierarchical classifiers to address single multi-class problems. In this paper, we propose a novel hierarchical classification method that generalizes support vector machine learning that is based on the results of support vector clustering method, and are structured in a way that mirrors the class hierarchy. Compared to previous non-hierarchical SVM classifier and famous documents categorization systems, the proposed hierarchical SVM classification has a better improvement in classification accuracy in the standard Reuters corpus.

论文关键词:Information retrieval,Document categorization,Hierarchical classification,Support vector machines,Support vector clustering method,Machine learning

论文评审过程:Available online 11 July 2006.

论文官网地址:https://doi.org/10.1016/j.eswa.2006.06.009