Context-aware profiling of concepts from a semantic topological space

作者:

Highlights:

摘要

In the era of Internet of “everything”, the natural language text is still the undiscussed medium of representing information, as evidenced by the pervasiveness of tweets, instant messages, posts, and documents. There is an increasing need of innovative technologies targeted at a more machine-oriented communication. Many keyword-based and statistical approaches have supported information retrieval, data mining, and natural language processing systems, but a deeper understanding of text is still an urgent challenge: concepts, semantic relationships among them, contextual information needed for the concept disambiguation require further progress in the textual-information management.This work introduces a novel technique of extracting the main concepts from the text. Concepts are described by word-based connections disposed in a semantic topological space, built by the formal model, the simplicial complex. It links the points, i.e., the words appearing in the text and incrementally creates a geometrical structure, describing concepts that are more or less specialized, depending on the aggregation distance of words. The conceptual network is context-aware, since it reveals unambiguous concepts, specialized by the analysis of the surrounding text. The framework that implements the approach, discovers basic concepts, composed of minimal number of words useful to describe a finite sense concept, and richer extended concepts built adding further relations among terms. The final topological space provides a multi-granule concept representation: from a local, word-closeness view to a highly refined description. Experiments and comparative analysis validate the effectiveness of the approach, evidencing satisfactory performance in the concept identification, with precision values greater than 80% in the most of the experiments and the recall is on average, around 60–70% with peaks of 90% for some specific concept categories.

论文关键词:Concept learning,Simplicial complex,Semantic topological space,Context-based concepts

论文评审过程:Received 8 June 2016, Revised 11 May 2017, Accepted 13 May 2017, Available online 15 May 2017, Version of Record 6 June 2017.

论文官网地址:https://doi.org/10.1016/j.knosys.2017.05.008