Web document summarization by exploiting social context with matrix co-factorization

作者:

Highlights:

摘要

In the context of social media, users usually post relevant information corresponding to the contents of events mentioned in a Web document. This information posses two important values in that (i) it reflects the content of an event and (ii) it shares hidden topics with sentences in the main document. In this paper, we present a novel model to capture the nature of relationships between document sentences and post information (comments or tweets) in sharing hidden topics for summarization of Web documents by utilizing relevant post information. Unlike previous methods which are usually based on hand-crafted features, our approach ranks document sentences and user posts based on their importance to the topics. The sentence-user-post relation is formulated in a share topic matrix, which presents their mutual reinforcement support. Our proposed matrix co-factorization algorithm computes the score of each document sentence and user post and extracts the top ranked document sentences and comments (or tweets) as a summary. We apply the model to the task of summarization on three datasets in two languages, English and Vietnamese, of social context summarization and also on DUC 2004 (a standard corpus of the traditional summarization task). According to the experimental results, our model significantly outperforms the basic matrix factorization and achieves competitive ROUGE-scores with state-of-the-art methods.

论文关键词:Data mining,Information retrieval,Document summarization,Social context summarization,Matrix factorization

论文评审过程:Received 1 June 2018, Revised 27 October 2018, Accepted 7 December 2018, Available online 21 January 2019, Version of Record 21 January 2019.

论文官网地址:https://doi.org/10.1016/j.ipm.2018.12.006