View Inference for Heterogeneous XML Information Integration

作者:Euna Jeong, Chun-Nan Hsu

摘要

This paper proposes a novel approach to integrating heterogeneous XML DTDs. With this approach, an information agent can be easily extended to integrate heterogeneous XML-based contents and perform federated search. Based on a tree grammar inference technique, this approach derives an integrated view of XML DTDs in an information integration framework. The derivation takes advantages of naming and structural similarities among DTDs in similar domains. The complete approach consists of three main steps. (1) DTD clustering clusters DTDs in similar domains into classes. (2) Schema learner applies a tree grammar inference technique to generate a set of tree grammar rules from the DTDs in a class from the previous step. (3) Minimizer optimizes the rules generated in the previous step, transforms them into an integrated view, and generates source descriptions. We have implemented the approach into a system called DEEP and tested the system on several domains. Experimental results reveal that this system can effectively and efficiently integrate radically different DTDs.

论文关键词:XML DTD, mark-up schemes, semistructured data, federated search, distributed databases, intelligent agents

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1020999107730