Automatic generation of semantically enriched web pages by a text mining approach

作者:

Highlights:

摘要

Nowadays most of the Web pages contain little amount of structure and supporting information that can reveal their semantics or meanings. To enable automated processing of the Web pages, semantic information such as metadata and tags regarding to each page should be added to it. Several authoring tools have been developed to help users tackling this task. However, manual or semi-automatic authoring is implausible when we intend to annotate large amount of Web pages. In this work, we proposed a method to automatically generate some descriptive metadata and tags for a Web page. The idea is to apply the self-organizing map algorithm to cluster the Web pages and discover the relationships between these clusters. In the mean time, the themes of each cluster are also identified. We then use such relationships and themes to tag the Web pages and generate metadata for the Web pages. The result of experiments shows that our method may generate semantically relevant metadata and tags for the Web pages.

论文关键词:Metadata generation,Semantic tagging,Text mining,Self-organizing map

论文评审过程:Available online 20 February 2009.

论文官网地址:https://doi.org/10.1016/j.eswa.2009.02.022