HPS: High precision stemmer

作者:

Highlights:

• New unsupervised stemming algorithm is introduced in this article.

• The algorithm exploits lexical as well as semantic information of words.

• Performance of stemming is measured on several languages (Czech, Slovak, Polish, Hungarian, Spanish and English).

• We outperform competing stemmers in inflection removal test, information retrieval task and language modeling task.

摘要

•New unsupervised stemming algorithm is introduced in this article.•The algorithm exploits lexical as well as semantic information of words.•Performance of stemming is measured on several languages (Czech, Slovak, Polish, Hungarian, Spanish and English).•We outperform competing stemmers in inflection removal test, information retrieval task and language modeling task.

论文关键词:Stemming,Morphology,Maximum entropy,Maximum mutual information,Language modeling,Information retrieval

论文评审过程:Received 1 November 2013, Revised 21 August 2014, Accepted 27 August 2014, Available online 23 September 2014.

论文官网地址:https://doi.org/10.1016/j.ipm.2014.08.006