Minimally-supervised extraction of domain-specific part–whole relations using Wikipedia as knowledge-base

作者:

Highlights:

摘要

We present a minimally-supervised approach for learning part–whole relations from texts. Unlike previous techniques, we focused on sparse, domain-specific texts. The novelty in our approach lies in the use of Wikipedia as a knowledge-base, from which we first acquire a set of reliable patterns that express part–whole relations. This is achieved by a minimally-supervised algorithm. We then use the patterns acquired to extract part–whole relation triples from a collection of sparse, domain-specific texts. Our strategy, of learning in one domain and applying the knowledge in another domain is based upon the notion of domain-adaption. It allows us to overcome the challenges of learning the relations directly from the sparse, domain-specific corpus. Our experimental evaluations reveal that, despite its general-purpose nature, Wikipedia can be exploited as a source of knowledge for improving the performance of domain-specific part–whole relation extraction. As our other contributions, we propose a mechanism that mitigates the negative impact of semantic-drift on minimally-supervised algorithms. Also, we represent the patterns in the extracted relations using sophisticated syntactic structures that avoid the limitations of traditional surface string representations. In addition, we show that domain-specific part–whole relations cannot be conclusively classified in existing taxonomies.

论文关键词:Text mining,Ontology learning,Relation extraction,Question–answering systems,Knowledge management applications

论文评审过程:Received 4 January 2011, Revised 6 July 2011, Accepted 25 June 2012, Available online 6 July 2012.

论文官网地址:https://doi.org/10.1016/j.datak.2012.06.004