Efficient discovery of similarity constraints for matching dependencies

作者:

Highlights:

摘要

The concept of matching dependencies (mds) has recently been proposed for specifying matching rules for object identification. Similar to the functional dependencies (with conditions), mds can also be applied to various data quality applications such as detecting the violations of integrity constraints. In this paper, we study the problem of discovering similarity constraints for matching dependencies from a given database instance. First, we introduce the measures, support and confidence, for evaluating the utility of mds in the given data. Then, we study the discovery of mds with certain utility requirements of support and confidence. Exact algorithms are developed, together with pruning strategies to improve the time performance. Since the exact algorithm has to traverse all the data during the computation, we propose an approximate solution which only uses part of the data. A bound of relative errors introduced by the approximation is also developed. Finally, our experimental evaluation demonstrates the efficiency of the proposed methods.

论文关键词:Data dependencies,Matching dependencies,Management of integrity constraints,Database integration

论文评审过程:Received 15 December 2011, Revised 12 June 2013, Accepted 12 June 2013, Available online 29 June 2013.

论文官网地址:https://doi.org/10.1016/j.datak.2013.06.003