Question retrieval using combined queries in community question answering

作者:Saquib Khushhal, Abdul Majid, Syed Ali Abbas, Malik Sajjad Ahmed Nadeem, Saeed Arif Shah

摘要

Community question answering (cQA) has emerged as a popular service on the web; users can use it to ask and answer questions and access historical question-answer (QA) pairs. cQA retrieval, as an alternative to general web searches, has several advantages. First, user can register a query in the form of natural language sentences instead of a set of keywords; thus, they can present the required information more clearly and comprehensively. Second, the system returns several possible answers instead of a long list of ranked documents, thereby enhancing the efficient location of the desired answers. Question retrieval from a cQA archive, an essential function of cQA retrieval services, aims to retrieve historical QA pairs relevant to the query question. In this study, combined queries (combined inverted and nextword indexes) are proposed for question retrieval in cQA. The method performance is investigated for two different scenarios: (a) when only questions from QA pairs are used as documents, and (b) when QA pairs are used as documents. In the proposed method, combined indexes are first created for both queries and documents; then, different information retrieval (IR) models are used to retrieve relevant questions from the cQA archive. Evaluation is performed on a public Yahoo! Answers dataset; the results thereby obtained show that using combined queries for all three IR models (vector space model, Okapi model, and language model) improves performance in terms of the retrieval precision and ranking effectiveness. Notably, by using combined indexes when both QA pairs are used as documents, the retrieval and ranking effectiveness of these cQA retrieval models increases significantly.

论文关键词:Question retrieval, Community question-answering services, Combined queries, Combined indexes, Inverted index

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-020-00612-x