Unsupervised word sense disambiguation with N-gram features

作者:Daniel Preotiuc-Pietro, Florentina Hristea

摘要

The present paper concentrates on the issue of feature selection for unsupervised word sense disambiguation (WSD) performed with an underlying Naïve Bayes model. It introduces web N-gram features which, to our knowledge, are used for the first time in unsupervised WSD. While creating features from unlabeled data, we are “helping” a simple, basic knowledge-lean disambiguation algorithm to significantly increase its accuracy as a result of receiving easily obtainable knowledge. The performance of this method is compared to that of others that rely on completely different feature sets. Test results concerning nouns, adjectives and verbs show that web N-gram feature selection is a reliable alternative to previously existing approaches, provided that a “quality list” of features, adapted to the part of speech, is used.

论文关键词:Bayesian classification, The EM algorithm, Word sense disambiguation, Unsupervised disambiguation, Web-scale N-grams

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10462-011-9306-y