Managing knowledge on the Web – Extracting ontology from HTML Web

作者：

摘要

In recent years, the Internet has become one of the most important sources of information, and it is now imperative that companies are able to collect, retrieve, process, and manage information from the Web. However, due to the sheer amount of information available, browsing web content by searches using keywords is inefficient, largely because unstructured HTML web pages are written for human comprehension and not for direct machine processing. For the same reason, the degree of web automation is limited. It is recognized that semantics can enhance web automation, but it will take an indefinite amount of effort to convert the current HTML Web into the Semantic Web. This study proposes a novel ontology extractor, called OntoSpider, for extracting ontology from the HTML Web. The contribution of this work is the design and implementation of a six-phase process that includes the preparation, transformation, clustering, recognition, refinement, and revision for extracting ontology from unstructured HTML pages. The extracted ontology provides structured and relevant information for applications such as e-commerce and knowledge management that can be compared and analyzed more effectively. We give detailed information on the system and provide a series of experimental results that validate the system design and illustrate the effectiveness of OntoSpider.

论文关键词：Ontology,Semantic Web,Knowledge management applications,Intelligent Web services

论文评审过程：Received 20 May 2008, Revised 9 February 2009, Accepted 18 February 2009, Available online 9 March 2009.

论文官网地址：https://doi.org/10.1016/j.dss.2009.02.011