A semantic approach for extracting domain taxonomies from text

作者:

Highlights:

• We present a semantic approach for learning domain taxonomies from text.

• Word sense disambiguation is applied on text and on existing taxonomies.

• We refine the subsumption method for term relations to include concept semantics.

• We define new semantic measures for evaluating the built taxonomies.

• Our method performs well for capturing the broader–narrower inter-concept relation.

摘要

In this paper we present a framework for the automatic building of a domain taxonomy from text corpora, called Automatic Taxonomy Construction from Text (ATCT). This framework comprises four steps. First, terms are extracted from a corpus of documents. From these extracted terms the ones that are most relevant for a specific domain are selected using a filtering approach in the second step. Third, the selected terms are disambiguated by means of a word sense disambiguation technique and concepts are generated. In the final step, the broader–narrower relations between concepts are determined using a subsumption technique that makes use of concept co-occurrences in a text. For evaluation, we assess the performance of the ATCT framework using the semantic precision, semantic recall, and the taxonomic F-measure that take into account the concept semantics. The proposed framework is evaluated in the field of economics and management as well as the medical domain.

论文关键词:Taxonomy learning,Word sense disambiguation,Term extraction,Subsumption method,Semantic taxonomy evaluation

论文评审过程:Received 12 December 2012, Revised 17 March 2014, Accepted 19 March 2014, Available online 27 March 2014.

论文官网地址:https://doi.org/10.1016/j.dss.2014.03.006