An HMM-based over-sampling technique to improve text classification

作者:

Highlights:

• An over-sampling balancing method based on document content is proposed.

• The technique includes an HMM that generates samples based on existing documents.

• The model is tested with a SVM classifier in two medical document collections.

• Results show the method outperforms another well-used data balancing techniques.

摘要

•An over-sampling balancing method based on document content is proposed.•The technique includes an HMM that generates samples based on existing documents.•The model is tested with a SVM classifier in two medical document collections.•Results show the method outperforms another well-used data balancing techniques.

论文关键词:Hidden Markov Model,Text classification,Oversampling techniques

论文评审过程:Available online 17 July 2013.

论文官网地址:https://doi.org/10.1016/j.eswa.2013.07.036