Hierarchical document classification using automatically generated hierarchy

作者:Tao Li, Shenghuo Zhu, Mitsunori Ogihara

摘要

Automated text categorization has witnessed a booming interest with the exponential growth of information and the ever-increasing needs for organizations. The underlying hierarchical structure identifies the relationships of dependence between different categories and provides valuable sources of information for categorization. Although considerable research has been conducted in the field of hierarchical document categorization, little has been done on automatic generation of topic hierarchies. In this paper, we propose the method of using linear discriminant projection to generate more meaningful intermediate levels of hierarchies in large flat sets of classes. The linear discriminant projection approach first transforms all documents onto a low-dimensional space and then clusters the categories into hier- archies accordingly. The paper also investigates the effect of using generated hierarchical structure for text classification. Our experiments show that generated hierarchies improve classification performance in most cases.

论文关键词:text categorization, linear discriminant projection, document classification, hierarchy generation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-006-0019-7