Constructing a reliable Web graph with information on browsing behavior

作者:

Highlights:

摘要

Page quality estimation is one of the greatest challenges for Web search engines. Hyperlink analysis algorithms such as PageRank and TrustRank are usually adopted for this task. However, low quality, unreliable and even spam data in the Web hyperlink graph makes it increasingly difficult to estimate page quality effectively. Analyzing large-scale user browsing behavior logs, we found that a more reliable Web graph can be constructed by incorporating browsing behavior information. The experimental results show that hyperlink graphs constructed with the proposed methods are much smaller in size than the original graph. In addition, algorithms based on the proposed “surfing with prior knowledge” model obtain better estimation results with these graphs for both high quality page and spam page identification tasks. Hyperlink graphs constructed with the proposed methods evaluate Web page quality more precisely and with less computational effort.

论文关键词:Web graph,Quality estimation,Hyperlink analysis,User behavior analysis,PageRank

论文评审过程:Received 17 March 2010, Revised 30 May 2012, Accepted 13 June 2012, Available online 23 June 2012.

论文官网地址:https://doi.org/10.1016/j.dss.2012.06.001