Generic technologies for single- and multi-document summarization

作者:

Highlights:

摘要

The technologies for single- and multi-document summarization that are described and evaluated in this article can be used on heterogeneous texts for different summarization tasks. They refer to the extraction of important sentences from the documents, compressing the sentences to their essential or relevant content, and detecting redundant content across sentences. The technologies are tested at the Document Understanding Conference, organized by the National Institute of Standards and Technology, USA in 2002 and 2003. The system obtained good to very good results in this competition. We tested our summarization system also on a variety of English Encyclopedia texts and on Dutch magazine articles. The results show that relying on generic linguistic resources and statistical techniques offer a basis for text summarization.

论文关键词:Text summarization,Topic detection,Headline construction

论文评审过程:Received 30 July 2003, Accepted 17 December 2003, Available online 5 March 2004.

论文官网地址:https://doi.org/10.1016/j.ipm.2003.12.006