Finding a representative subset from large-scale documents

作者：

Highlights：

• A representative information extraction framework is proposed.

• Coverage, redundancy, and distribution consistency are considered in the framework.

• An extraction method RepExtract is proposed to find a representative subset.

• Extensive experiments are conducted to demonstrate the superiority of RepExtract.

摘要

•A representative information extraction framework is proposed.•Coverage, redundancy, and distribution consistency are considered in the framework.•An extraction method RepExtract is proposed to find a representative subset.•Extensive experiments are conducted to demonstrate the superiority of RepExtract.

论文关键词：Information extraction method,Coverage,Redundancy,Distribution consistency

论文评审过程：Received 4 March 2016, Revised 23 May 2016, Accepted 23 May 2016, Available online 13 June 2016, Version of Record 13 June 2016.

论文官网地址：https://doi.org/10.1016/j.joi.2016.05.003