Utilizing the multiple facets of WWW contents

作者:

Highlights:

摘要

Current query languages for the Web (e.g., W3QL, WebLog and WebSQL) explore the structure of the Web. However, usually, the structure of the Web has little to do with the semantics of the data. Therefore, it is practically difficult to pose database queries over the Web. We introduce a new type of tags for denoting the semantics of data stored in HTML pages. These semantic tags (implemented as HTML comments) superimpose on HTML pages semistructured objects in the style of the OEM model. The paper discusses two implemented tools for fully utilizing the semantics. The first is a visualization tool for displaying both the HTML reading of Web pages and the OEM reading of Web pages. The second tool is a query language, similar to LOREL, that can query the HTML structure and/or the OEM reading. The above formalism and tools provide data-modeling capabilities for the Web that fit its heterogeneous nature. Real database queries, taking the OEM point of view, can be formulated, including queries about the schema as well as queries about the HTML structure of Web pages. Therefore, the query language is not restricted to portions of the Web in which semantic tags are used.

论文关键词:Lorel,OEM,OHTML,Query language,Semantic tags,Semistructured data,WWW,W3LOREL

论文评审过程:Received 3 April 1998, Revised 3 April 1998, Accepted 3 April 1998, Available online 10 February 1999.

论文官网地址:https://doi.org/10.1016/S0169-023X(98)00026-3