ARIEX: Automated ranking of information extractors

作者:

Highlights:

摘要

Information extractors are used to transform the user-friendly information in a web document into structured information that can be used to feed a knowledge-based system. Researchers are interested in ranking them to find out which one performs the best. Unfortunately, many rankings in the literature are deficient. There are a number of formal methods to rank information extractors, but they also have many problems and have not reached widespread popularity. In this article, we present ARIEX, which is an automated method to rank web information extraction proposals. It does not have any of the problems that we have identified in the literature. Our proposal shall definitely help authors make sure that they have advanced the state of the art not only conceptually, but from an empirical point of view; it shall also help practitioners make informed decisions on which proposal is the most adequate for a particular problem.

论文关键词:Web documents,Information extraction,Ranking method,Automatisation

论文评审过程:Received 16 April 2015, Revised 3 November 2015, Accepted 5 November 2015, Available online 30 November 2015, Version of Record 21 December 2015.

论文官网地址:https://doi.org/10.1016/j.knosys.2015.11.004