Text summarization using topic-based vector space model and semantic measure

作者：

Highlights：

•

摘要

The primary shortcoming associated with extractive text summarization is redundancy, where more than one sentence representing a similar type of information are incorporated in summary. In the last two decades, a lot of extractive text summarization methods have been proposed, but less attention was paid to the redundancy issue. In this paper, we propose a text summarization technique that incorporates topic modeling and semantic measure within the vector space model to find the extractive summary of the given text. Our main objective is to address the redundancy problem associated with summarization methods and include only those sentences in summary, which represent the maximum of the topics embedded in the given text document. We generate the topic vector of the given document by representing the sentences in an intermediate form using a vector space model and topic modeling. Moreover, to make the proposed method efficient, we incorporate the semantic similarity measure to find the relevance of the sentence. We introduce two different ways to create the topic vector from the given document, i.e., Combined topic vector and Individual topic vector approach. Evaluation results on two datasets show that the summaries generated by both variants (Combined and Individual topic vector techniques) of the proposed method are found to be closer to the human-generated summaries when compared with the existing text summarization methods.

论文关键词：Extractive summarization,Topic modeling,Relevance measure,Vector space model,Semantic measure

论文评审过程：Received 22 April 2020, Revised 9 November 2020, Accepted 24 January 2021, Available online 9 February 2021, Version of Record 9 February 2021.

论文官网地址：https://doi.org/10.1016/j.ipm.2021.102536