Learning (k,l)-contextual tree languages for information extraction from web pages

作者:Stefan Raeymaekers, Maurice Bruynooghe, Jan Van den Bussche

摘要

This paper introduces a novel method for learning a wrapper for extraction of information from web pages, based upon (k,l)-contextual tree languages. It also introduces a method to learn good values of k and l based on a few positive and negative examples. Finally, it describes how the algorithm can be integrated in a tool for information extraction.

论文关键词:Information extraction, Wrapper induction, Tree languages

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10994-008-5049-7