Web data retrieval and extraction

作者:

Highlights:

摘要

We present the Object-Web Mediator to querying integrated Web data sources composed of a retrieval component based on an intermediate object view mechanism and search views, and an XML engine. Search views map the source capabilities to attributes defined at object classes, and parsers that process retrieved documents and cache them in XML format. The XML engine queries cached documents, extracts data, and returns extracted data for evaluation. The originality of this approach consists of a generic view mechanism to access data sources with limited data access and complex capabilities, and an XML engine to support data extraction and reorganization. This approach has been developed and demonstrated as part of the multi-database system supporting queries via uniform Object Protocol Model interfaces against public Web data sources of interest to the biologists.

论文关键词:Web data integration,Retrieval,Extraction,XML,Mediation,Web source capability,Biological data integration

论文评审过程:Received 3 July 2002, Revised 3 July 2002, Accepted 3 July 2002, Available online 11 October 2002.

论文官网地址:https://doi.org/10.1016/S0169-023X(02)00143-X