D2S: Document-to-sentence framework for novelty detection

作者:Flora S. Tsai, Yi Zhang

摘要

Novelty detection aims at identifying novel information from an incoming stream of documents. In this paper, we propose a new framework for document-level novelty detection using document-to-sentence (D2S) annotations and discuss the applicability of this method. D2S first segments a document into sentences, determines the novelty of each sentence, then computes the document-level novelty score based on a fixed threshold. Experimental results on APWSJ data show that D2S outperforms standard document-level novelty detection in terms of redundancy-precision (RP) and redundancy-recall (RR). We applied D2S on the document-level data from the TREC 2004 and TREC 2003 Novelty Track and find that D2S is useful in detecting novel information in data with a high percentage of novel documents. However, D2S shows a strong capability to detect redundant information regardless of the percentage of novel documents. D2S has been successfully integrated in a real-world novelty detection system.

论文关键词:Novelty detection, Redundancy, Sentence segmentation, Document novelty, Novelty dataset, Text mining

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-010-0372-2