Supporting knowledge discovery for biodiversity

作者：

Highlights：

• We introduce a methodology for the extraction of the text semantic, summarized in a conceptual graph (CG).

• CGs are derived from a dependency-based parsing that also uses constituency information.

• CGs act as indexes for information retrieval (IR), dealing with text incompleteness and vagueness.

• The resulting IR system is tested on a botanic corpus using a topic set with different levels of difficulty for queries.

• Conceptual retrieval performs better than classic one regardless of the level of difficulty, particularly at the high one.

摘要

•We introduce a methodology for the extraction of the text semantic, summarized in a conceptual graph (CG).•CGs are derived from a dependency-based parsing that also uses constituency information.•CGs act as indexes for information retrieval (IR), dealing with text incompleteness and vagueness.•The resulting IR system is tested on a botanic corpus using a topic set with different levels of difficulty for queries.•Conceptual retrieval performs better than classic one regardless of the level of difficulty, particularly at the high one.

论文关键词：Knowledge discovery,Natural language processing,Text mining

论文评审过程：Received 1 September 2014, Accepted 26 August 2015, Available online 5 September 2015, Version of Record 10 November 2015.

论文官网地址：https://doi.org/10.1016/j.datak.2015.08.002