Measuring quality of similarity functions in approximate data matching

作者:

Highlights:

摘要

This paper presents a method for assessing the quality of similarity functions. The scenario taken into account is that of approximate data matching, in which it is necessary to determine whether two data instances represent the same real world object. Our method is based on the semi-automatic estimation of optimal threshold values. We propose two methods for performing such estimation. The first method is an algorithm based on a reward function, and the second is a statistical method. Experiments were carried out to validate the techniques proposed. The results show that both methods for threshold estimation produce similar results. The output of such methods was used to design a grading function for similarity functions. This grading function, called discernability, was used to compare a number of similarity functions applied to an experimental data set.

论文关键词:Approximate data matching,Similarity functions,Retrieval evaluation

论文评审过程:Received 3 July 2006, Revised 23 August 2006, Accepted 5 September 2006, Available online 22 November 2006.

论文官网地址:https://doi.org/10.1016/j.joi.2006.09.001