Documents similarity measurement using field association terms

作者:

Highlights:

摘要

Conventional approaches to text analysis and information retrieval which measured document similarity by using considering all of the information in texts are a relatively inefficiency for processing large text collections in heterogeneous subject areas. This paper outlined a new text manipulation system FA-Sim that is useful for retrieving information in large heterogeneous texts and for recognizing content similarity in text excerpts. FA-Sim is based on flexible text matching procedures carried out in various contexts and various field ranks. FA-Sim measures texts similarity by using specific field association (FA) terms instead of by comparing all text information. Similarity between texts is faster and higher by using FA-Sim than other two analysis methods. Therefore, Recall and Precision significantly improved by 39% and 37% over these two traditional methods.

论文关键词:Information retrieval,FA terms,FA-Sim,Recall,Precision

论文评审过程:Received 4 October 2002, Accepted 12 February 2003, Available online 26 April 2003.

论文官网地址:https://doi.org/10.1016/S0306-4573(03)00019-0