Learning to extract and summarize hot item features from multiple auction web sites

作者:Tak-Lam Wong, Wai Lam

摘要

It is difficult to digest the poorly organized and vast amount of information contained in auction Web sites which are fast changing and highly dynamic. We develop a unified framework which can automatically extract product features and summarize hot item features from multiple auction sites. To deal with the irregularity in the layout format of Web pages and harness the uncertainty involved, we formulate the tasks of product feature extraction and hot item feature summarization as a single graph labeling problem using conditional random fields. One characteristic of this graphical model is that it can model the inter-dependence between neighbouring tokens in a Web page, tokens in different Web pages, as well as various information such as hot item features across different auction sites. We have conducted extensive experiments on several real-world auction Web sites to demonstrate the effectiveness of our framework.

论文关键词:Information extraction, Web mining, Conditional random fields

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-007-0078-2