Discriminative and deterministic approaches towards entity resolution

作者:Byung-Won On, Ingyu Lee, Gyu Sang Choi, Ho-Sik Park

摘要

To address the entity resolution problem, existing studies usually consist of two-steps. Given two lists of records, in the first step a small set of duplicate records (a candidate set) are selected based on index structures and algorithms for efficient entity resolution. Then, a given similarity function is applied to quantify the similarity of records in the candidate set. However, for real applications, it is a non-trivial task to select appropriate indexing techniques and similarity functions. In this paper, we tackle the problem of indexing and similarity function identification using both discriminative and deterministic approaches that select the best of indexing and similarity measures. According to our experimental results, our proposed solution considering both discriminative and deterministic approaches shows more than a 90 % average accuracy within hundreds of seconds.

论文关键词:Entity resolution, Approximate string matching, Similarities, Support vector machines, Blocking techniques

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-014-0308-5