Character n-gram application for automatic new topic identification

作者:

Highlights:

• We used the character n-gram method to predict topic changes in search engine queries.

• We obtained more successful estimations than previous studies, and made remarkable contributions.

• We compared the character n-gram method with the Levenshtein edit-distance method.

• We analyzed ASPELL, Google and Bing search engines as pre-processed spelling correction methods.

• We conclude that Google could be used as a pre-processed spelling correction method.

摘要

•We used the character n-gram method to predict topic changes in search engine queries.•We obtained more successful estimations than previous studies, and made remarkable contributions.•We compared the character n-gram method with the Levenshtein edit-distance method.•We analyzed ASPELL, Google and Bing search engines as pre-processed spelling correction methods.•We conclude that Google could be used as a pre-processed spelling correction method.

论文关键词:Content-ignorant algorithms,The character n-gram method,New topic identification,The Levenshtein edit-distance,Pre-processed spelling correction methods

论文评审过程:Received 1 November 2011, Revised 5 August 2013, Accepted 26 June 2014, Available online 24 August 2014.

论文官网地址:https://doi.org/10.1016/j.ipm.2014.06.005