A reliable FAQ retrieval system using a query log classification technique based on latent semantic analysis

作者：

Highlights：

•

摘要

To obtain high performances, previous works on FAQ retrieval used high-level knowledge bases or handcrafted rules. However, it is a time and effort consuming job to construct these knowledge bases and rules whenever application domains are changed. To overcome this problem, we propose a high-performance FAQ retrieval system only using users’ query logs as knowledge sources. During indexing time, the proposed system efficiently clusters users’ query logs using classification techniques based on latent semantic analysis. During retrieval time, the proposed system smoothes FAQs using the query log clusters. In the experiment, the proposed system outperformed the conventional information retrieval systems in FAQ retrieval. Based on various experiments, we found that the proposed system could alleviate critical lexical disagreement problems in short document retrieval. In addition, we believe that the proposed system is more practical and reliable than the previous FAQ retrieval systems because it uses only data-driven methods without high-level knowledge sources.

论文关键词：FAQ retrieval,Lexical disagreement problem,Query log clusters,Latent semantic analysis

论文评审过程：Received 11 May 2006, Accepted 25 July 2006, Available online 6 October 2006.

论文官网地址：https://doi.org/10.1016/j.ipm.2006.07.018