Focused crawling enhanced by CBP–SLC

作者:

Highlights:

• A heuristic-based approach, CBP–SLC, is presented for enhancing focused crawling.

• A weighted voting classifier using TFIPNDF feature weighting approach is built.

• 1-DNFC identifies more reliable negative documents from the unlabeled examples set.

摘要

•A heuristic-based approach, CBP–SLC, is presented for enhancing focused crawling.•A weighted voting classifier using TFIPNDF feature weighting approach is built.•1-DNFC identifies more reliable negative documents from the unlabeled examples set.

论文关键词:Focused crawling,DOM tree,TFIPNDF,CBP–SLC,WVC,Tunneling

论文评审过程:Received 1 January 2013, Revised 24 May 2013, Accepted 13 June 2013, Available online 11 July 2013.

论文官网地址:https://doi.org/10.1016/j.knosys.2013.06.008