An efficient substring search method by using delayed keyword extraction

作者:

Highlights:

摘要

In the information retrieval systems, one of the most important and difficult operations is to extract appropriate keywords from documents. This paper proposes an effective substring search method by extending a pattern matching machine for multi-keyword based on Aho and Corasick (AC) called AC machine. The proposed method enables us to extract keyword candidates as much as possible and to select the suitable keywords for users' purpose at a retrieval stage. This method contains four types of substring search methods (exact, prefix, suffix and proper substring search). This paper also proposes a construction algorithm of the retrieval structure for speeding up the substring search. From the simulation results, it is shown that the retrieval time of the presented method is as fast as the key retrieval method based on the trie.

论文关键词:Information retrieval,Pattern matching machine of Aho and Corasick,Delayed keyword extraction,Substring search,Single component (SC) and long component (LC) keyword

论文评审过程:Accepted 20 September 2000, Available online 31 May 2001.

论文官网地址:https://doi.org/10.1016/S0306-4573(00)00050-9