Deep context modeling for multi-turn response selection in dialogue systems

摘要

Multi-turn response selection is a major task in building intelligent dialogue systems. Most existing works focus on modeling the semantic relationship between the utterances and the candidate response with neural networks like RNNs and various attention mechanisms. In this paper, we study how to leverage the advantage of pre-trained language models (PTMs) to multi-turn response selection in retrieval-based chatbots. We propose a deep context modeling architecture (DCM) for multi-turn response selection by utilizing BERT as the context encoder. DCM is formulated as a four-module architecture, namely contextual encoder, utterance-to-response interaction, features aggregation, and response selection. Moreover, in DCM, we introduce the next utterance prediction as a pre-training scheme based on BERT, aiming to adapt general BERT to accommodate the inherent context continuity underlying the multi-turn dialogue. Taking BERT as the backbone encoder, we then investigate a variety of strategies to perform response selection with comprehensive comparisons. Empirical results on three public datasets from two different languages show that our proposed model outperforms existing promising models significantly, pushing recall to 86.8% (+5.2% improvement over BERT) on Ubuntu Dialogue corpus, recall to 68.5% (+6.4% improvement over BERT) on E-Commerce Dialogue corpus, MAP and MRR to 61.6% and 64.9% respectively (+2.3% and 1.8% improvement over BERT) on Douban Conversation corpus, achieving new state-of-the-art performance for multi-turn response selection.