A novel dual wing harmonium model aided by 2-D wavelet transform subbands for document data mining

作者:

Highlights:

摘要

A novel dual wing harmonium model that integrates multiple features including term frequency features and 2-D wavelet transform features into a low dimensional semantic space is proposed for the applications of document classification and retrieval. Terms are extracted from the graph representation of document by employing weighted feature extraction method. 2-D wavelet transform is used to compress the graph due to its sparseness while preserving the basic document structure. After transform, low-pass subbands are stacked to represent the term associations in a document. We then develop a new dual wing harmonium model projecting these multiple features into low dimensional latent topics with different probability distributions assumption. Contrastive divergence algorithm is used for efficient learning and inference. We perform extensive experimental verification in document classification and retrieval, and comparative results suggest that the proposed method delivers better performance than other methods.

论文关键词:Dual wing harmonium,2-D wavelet,Term association,Graph representation,Document data,Multiple features

论文评审过程:Available online 27 November 2009.

论文官网地址:https://doi.org/10.1016/j.eswa.2009.11.088