Efficient question classification and retrieval using category information and word embedding on cQA services

作者:Kyoungman Bae, Youngjoong Ko

摘要

Classifying the task of automatically assigning unlabeled questions into predefined categories (or topics) and effectively retrieving a similar question are crucial aspects of an effective cQA service. We first address the problems associated with estimating and utilizing the distribution of words in each category of word weights. We then apply an automatic expansion word generation technique that is based on our proposed weighting method and the pseudo relevance feedback to question classification. Secondly to address the lexical gap problem in question retrieval, the case frame of the sentence is first defined using the extracted components of a sentence, and a similarity measure based on the case frame and the word embedding is then derived to determine the similarities between two sentences. These similarities are then used to reorder the results of the first retrieval model. Consequently, the proposed methods significantly improve the performance of question classification and retrieval.

论文关键词:Question classification, Word weighting method, Category information, Pseudo-relevance feedback, Question expansion

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-019-00556-x