A two-domain coordinated sentence similarity scheme for question-answering robots regarding unpredictable outliers and non-orthogonal categories

作者:Boyang Li, Weisheng Xu, Zhiyu Xu, Jiaxin Li, Peng Peng

摘要

It is crucial and challenging for the question-answering robot (Qabot) to match the customer-input questions with the priori identification questions due to highly diversified expressions, especially in the case of Chinese. This article proposes a coordinated scheme to analyze the similarity between sentences in two independent domains instead of a single deep learning model. In the structure domain, the BLEU and data preprocessing are applied for binary analysis to discriminate the unpredictable outliers (illegal questions) to existing library. In the semantics domain, the MC-BERT model, which integrates the BERT encoder and the Multi-kernel convolutional top classifier, is developed to handle the non-orthogonality of class identification questions. The two-domain analyses are in parallel and the two similarity scores are coordinated for the final response. The linguistic features of Chinese are also taken into account. A realistic case of Qabot on energy trading service and finance is numerically studied. Computational results validate the effectiveness and accuracy of the proposed algorithm: Top-1 and Top-3 accuracies are 90.5% and 95.5%, respectively, which are significantly superior to the latest published results.

论文关键词:Natural language processing (NLP), Question-answering robot(Qabot), Similarity analysis, Structure domain, Semantics domain, Multi-kernel CNN enhanced BERT (MC-BERT)

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02269-7