A semantic approach for extracting domain taxonomies from text

作者：

Highlights：

• We present a semantic approach for learning domain taxonomies from text.

• Word sense disambiguation is applied on text and on existing taxonomies.

• We refine the subsumption method for term relations to include concept semantics.

• We define new semantic measures for evaluating the built taxonomies.

• Our method performs well for capturing the broader–narrower inter-concept relation.

摘要

In this paper we present a framework for the automatic building of a domain taxonomy from text corpora, called Automatic Taxonomy Construction from Text (ATCT). This framework comprises four steps. First, terms are extracted from a corpus of documents. From these extracted terms the ones that are most relevant for a specific domain are selected using a filtering approach in the second step. Third, the selected terms are disambiguated by means of a word sense disambiguation technique and concepts are generated. In the final step, the broader–narrower relations between concepts are determined using a subsumption technique that makes use of concept co-occurrences in a text. For evaluation, we assess the performance of the ATCT framework using the semantic precision, semantic recall, and the taxonomic F-measure that take into account the concept semantics. The proposed framework is evaluated in the field of economics and management as well as the medical domain.

论文关键词：Taxonomy learning,Word sense disambiguation,Term extraction,Subsumption method,Semantic taxonomy evaluation

论文评审过程：Received 12 December 2012, Revised 17 March 2014, Accepted 19 March 2014, Available online 27 March 2014.

论文官网地址：https://doi.org/10.1016/j.dss.2014.03.006