Independent component analysis for near-synonym choice

作者:

Highlights:

摘要

Despite their similar meanings, near-synonyms may have different usages in different contexts, and the development of algorithms that can verify whether near-synonyms do match their given contexts has been the focus of increasing concern. Such algorithms have many applications such as query expansion for information retrieval (IR), alternative word selection for writing support systems, and (near-)duplicate detection for text summarization. In this paper, we propose a framework that incorporates latent semantic analysis (LSA) and independent component analysis (ICA) to automatically select suitable near-synonyms according to the given context. LSA is used to discover useful latent features that do not frequently occur in the contexts of near-synonyms, and ICA is used to estimate a set of independent components by minimizing the dependence between features. An SVM classifier is then trained with the independent components for best near-synonym prediction. In experiments, we evaluate the proposed method on both Chinese and English sentences, and compare its performance to state-of-the-art supervised and unsupervised methods. Experimental results show that training on the independent components that contain useful contextual features with minimized term dependence can improve the classifiers' ability to discriminate among near-synonyms, thus yielding better performance.

论文关键词:Near-synonym choice,Independent component analysis,Information retrieval,Natural language processing

论文评审过程:Received 3 March 2012, Revised 21 October 2012, Accepted 31 December 2012, Available online 11 January 2013.

论文官网地址:https://doi.org/10.1016/j.dss.2012.12.038