A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities

Highlights：

• An unsupervised, language independent framework for generic extractive multi-document summarization.

• Bridging the gap between short texts and the conventional text processing methods by expanding sentences with respect to word sense disambiguation and tuning of conceptual densities in the sentences.

• The proposed approach is able to dynamically determine the number of clusters and their initial centroids aiming to identify the main concepts of the documents.

• Producing Summaries with respect to information significance, minimum redundancy, maximum coverage, and cohesion.

• Able to learn a context-aware semantic model for more accurately estimating semantic similarities.

摘要

•An unsupervised, language independent framework for generic extractive multi-document summarization.•Bridging the gap between short texts and the conventional text processing methods by expanding sentences with respect to word sense disambiguation and tuning of conceptual densities in the sentences.•The proposed approach is able to dynamically determine the number of clusters and their initial centroids aiming to identify the main concepts of the documents.•Producing Summaries with respect to information significance, minimum redundancy, maximum coverage, and cohesion.•Able to learn a context-aware semantic model for more accurately estimating semantic similarities.

论文关键词：Multi-document Extractive Summarization,Sentence Expansion,Conceptual Density Tuning,Word Embedding,Text Clustering,Language-independent Approach

论文评审过程：Received 13 January 2020, Revised 19 May 2020, Accepted 11 June 2020, Available online 25 June 2020, Version of Record 25 June 2020.