On learning web information extraction rules with TANGO

作者:

Highlights:

• TANGO can be adapted to particular websites or to keep with the evolution of HTML.

• It relies on an open catalogue of features and a highly configurable learning process.

• We provide a method to help re-configure our proposal to improve the effectiveness.

• It beats other state-of-the-art proposals regarding effectiveness.

摘要

Highlights•TANGO can be adapted to particular websites or to keep with the evolution of HTML.•It relies on an open catalogue of features and a highly configurable learning process.•We provide a method to help re-configure our proposal to improve the effectiveness.•It beats other state-of-the-art proposals regarding effectiveness.

论文关键词:Web information extraction,Semi-structured documents,Open catalogues of features,Learning rules,Variation points,Configuration method

论文评审过程:Received 1 August 2015, Revised 29 March 2016, Accepted 23 May 2016, Available online 21 June 2016, Version of Record 16 July 2016.

论文官网地址:https://doi.org/10.1016/j.is.2016.05.003