Text categorization: the assignment of subject descriptors to magazine articles
作者:
Highlights:
•
摘要
Automatic text categorization is an important research area and has a potential for many text-based applications including text routing and filtering. Typical text classifiers learn from example texts that are manually categorized. When categorizing magazine articles with broad subject descriptors, we study three aspects of text classification: (1) effective selection of feature words and proper names that reflect the main topics of the text; (2) learning algorithms; and (3) improvement of the quality of the learned classifier by selection of examples. The χ2 test, which is sometimes used for selecting terms that are highly related to a text class, is applied in a novel way when constructing a category weight vector. Despite a limited number of training examples, combining an effective feature selection with the χ2 learning algorithm for training the text classifier results in an adequate categorization of new magazine articles.
论文关键词:Automatic indexing,Text categorization,Machine learning
论文评审过程:Received 26 May 1999, Accepted 26 January 2000, Available online 28 July 2000.
论文官网地址:https://doi.org/10.1016/S0306-4573(00)00012-1