Knowledge base population using semantic label propagation

作者:

Highlights:

摘要

Training relation extractors for the purpose of automated knowledge base population requires the availability of sufficient training data. The amount of manual labeling can be significantly reduced by applying distant supervision, which generates training data by aligning large text corpora with existing knowledge bases. This typically results in a highly noisy training set, where many training sentences do not express the intended relation. In this paper, we propose to combine distant supervision with minimal human supervision by annotating features (in particular shortest dependency paths) rather than complete relation instances. Such feature labeling eliminates noise from the initial training set, resulting in a significant increase of precision at the expense of recall. We further improve on this approach by introducing the Semantic Label Propagation (SLP) method, which uses the similarity between low-dimensional representations of candidate training instances to again extend the (filtered) training set in order to increase recall while maintaining high precision. Our strategy is evaluated on an established test collection designed for knowledge base population (KBP) from the TAC KBP English slot filling task. The experimental results show that SLP leads to substantial performance gains when compared to existing approaches while requiring an almost negligible human annotation effort.

论文关键词:Relation extraction,Knowledge base population,Distant supervision,Active learning,Semi-supervised learning

论文评审过程:Received 15 November 2015, Revised 3 May 2016, Accepted 8 May 2016, Available online 10 May 2016, Version of Record 12 August 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.05.015