A generic framework and methodology for extracting semantics from co-occurrences

作者:

Highlights:

摘要

Extracting semantic associations from text corpora is an important problem with several applications. It is well understood that semantic associations from text can be discerned by observing patterns of co-occurrences of terms. However, much of the work in this direction has been piecemeal, addressing specific kinds of semantic associations. In this work, we propose a generic framework, using which several kinds of semantic associations can be mined. The framework comprises a co-occurrence graph of terms, along with a set of graph operators. A methodology for using this framework is also proposed, where the properties of a given semantic association can be hypothesized and tested over the framework. To show the generic nature of the proposed model, four different semantic associations are mined over a corpus comprising of Wikipedia articles. The design of the proposed framework is inspired from cognitive science — specifically the interplay between semantic and episodic memory in humans.

论文关键词:Cognitive models,Co-occurrence,Data mining,Text mining

论文评审过程:Received 15 June 2013, Revised 24 May 2014, Accepted 17 June 2014, Available online 27 June 2014.

论文官网地址:https://doi.org/10.1016/j.datak.2014.06.002