A systematic review of text stemming techniques

作者:Jasmeet Singh, Vishal Gupta

摘要

Stemming is a program that matches the morphological variants of the word to its root word. Stemming is extensively used as a pre-processing tool in the field of natural language processing, information retrieval, and language modeling. Though a lot of advancements have been made in the field, yet organized arrangement of the previous work and efforts are lacking in this field. In this paper, we present a review of the text stemming theory, algorithms, and applications. It first describes the existing literature relevant to text stemming by classifying it according to certain key parameters; then it describes the deep analysis of some well-known stemming algorithms on standard data sets. In the end, the current state-of-the-art and certain open issues related to unsupervised stemming are presented. The main aim of this paper is to provide an extensive and useful understanding of the important aspects of text stemming. The open issues and analysis of the current stemming techniques will help the researchers to think of new lines to conduct research in future.

论文关键词:Stemming, Natural language processing, Information retrieval, Language modeling

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10462-016-9498-2