Candidate sentence selection for extractive text summarization

Highlights：

• A new benchmark dataset for studies on automatic text summarization, which contains both human-generated abstracts and extracts, was proposed.

• The extractive summarization problem was revisited.

• The syntactic and semantic feature spaces used in summarization were comprehensively investigated.

• An ensembled feature space was introduced on a new long short-term memory-based neural network model (LSTM-NN).

• Experimental results showed that the use of ensemble feature space remarkably improved the single-use of syntactic or semantic features, and the proposed LSTM-NN also outperformed the state-of-the-art models for extractive summarization.

摘要

•A new benchmark dataset for studies on automatic text summarization, which contains both human-generated abstracts and extracts, was proposed.•The extractive summarization problem was revisited.•The syntactic and semantic feature spaces used in summarization were comprehensively investigated.•An ensembled feature space was introduced on a new long short-term memory-based neural network model (LSTM-NN).•Experimental results showed that the use of ensemble feature space remarkably improved the single-use of syntactic or semantic features, and the proposed LSTM-NN also outperformed the state-of-the-art models for extractive summarization.

论文评审过程：Received 4 April 2020, Revised 27 June 2020, Accepted 11 July 2020, Available online 12 August 2020, Version of Record 12 August 2020.