Optimizing entity join queries when data transmission cost dominates

作者:

Highlights:

摘要

Heterogeneities exist in a multidatabase environment. For example, a real world entity may be differently represented in relations of different databases. In particular, keys of these relations may be incompatible. In this paper, we consider processing entity join queries when data transmission cost dominates. An entity join operation ‘integrates’ tuples representing the same entities from different relations in which inconsistent data may exist. A natural way to process the entity join is to transmit both relations to a site, resolve the possible conflicts between corresponding attributes and process the join, which is very costly. In this paper, an approach is proposed to correctly transform a global query into local subqueries to preprocess entity join queries in multiple sites with an attempt to lower the cost of data transmission. Besides, an extension of the traditional semijoin, named extended semijoin, is proposed to further reduce the cost of data transmission for entity join query processing.

论文关键词:Entity join,Extended semijoin,Inconsistent data,Local processing,Multidatabase,Query optimization,Query transformation,Selectivity

论文评审过程:Received 26 April 1995, Revised 2 May 1996, Accepted 14 October 1996, Available online 19 May 1998.

论文官网地址:https://doi.org/10.1016/S0169-023X(96)00052-3