On validating web information extraction proposals

作者:

Highlights:

• Web information extractors help gather structured data from web documents.

• Validating them amounts to computing performance measures and comparing the results.

• The state-of-the-art validation methodology has three important issues.

• We have studied the previous issues and proved that they can bias the results.

• We provide an extension to the methodology to address the issues.

摘要

•Web information extractors help gather structured data from web documents.•Validating them amounts to computing performance measures and comparing the results.•The state-of-the-art validation methodology has three important issues.•We have studied the previous issues and proved that they can bias the results.•We provide an extension to the methodology to address the issues.

论文关键词:Web information extractors,Validation method

论文评审过程:Received 17 November 2020, Revised 7 February 2022, Accepted 19 February 2022, Available online 19 March 2022, Version of Record 26 March 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.116700