Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet

作者:

Highlights:

摘要

In this paper, a corpus-based thesaurus and WordNet were used to improve text categorization performance. We employed the k-NN algorithm and the back propagation neural network (BPNN) algorithms as the classifiers. The k-NN is a simple and famous approach for categorization, and the BPNNs has been widely used in the categorization and pattern recognition fields. However the standard BPNN has some generally acknowledged limitations, such as a slow training speed and can be easily trapped into a local minimum. To alleviate the problems of the standard BPNN, two modified versions, Morbidity neurons Rectified BPNN (MRBP) and Learning Phase Evaluation BPNN (LPEBP), were considered and applied to the text categorization. We conducted the experiments on both the standard reuter-21578 data set and the 20 Newsgroups data set. Experimental results showed that our proposed methods achieved high categorization effectiveness as measured by the precision, recall and F-measure protocols.

论文关键词:Text categorization,Corpus-based thesaurus,WordNet,k-NN,BPNN,Neural network

论文评审过程:Available online 22 July 2011.

论文官网地址:https://doi.org/10.1016/j.eswa.2011.07.070