Extraction of field-coherent passages

作者:

Highlights:

摘要

It is important to identify text that is substantially independent of adjacent material. This paper presents a technique for dividing text into field-coherent passages. The method presented is based upon extracting field-associated words or phrases from the text by determining how topics grow, shrink and shift from sentence to sentence. We propose measures of topic continuity and transition and suggest how those may be used to find the passage boundaries. After collecting 12,500 documents, we obtained an average precision of 88% and recall of 78% in a training document set.

论文关键词:Field-associated term,Topic matter: continuity and transition,Document classification

论文评审过程:Received 4 August 2000, Accepted 22 March 2001, Available online 27 November 2001.

论文官网地址:https://doi.org/10.1016/S0306-4573(01)00032-2