Supporting knowledge discovery for biodiversity

作者:

Highlights:

• We introduce a methodology for the extraction of the text semantic, summarized in a conceptual graph (CG).

• CGs are derived from a dependency-based parsing that also uses constituency information.

• CGs act as indexes for information retrieval (IR), dealing with text incompleteness and vagueness.

• The resulting IR system is tested on a botanic corpus using a topic set with different levels of difficulty for queries.

• Conceptual retrieval performs better than classic one regardless of the level of difficulty, particularly at the high one.

摘要

•We introduce a methodology for the extraction of the text semantic, summarized in a conceptual graph (CG).•CGs are derived from a dependency-based parsing that also uses constituency information.•CGs act as indexes for information retrieval (IR), dealing with text incompleteness and vagueness.•The resulting IR system is tested on a botanic corpus using a topic set with different levels of difficulty for queries.•Conceptual retrieval performs better than classic one regardless of the level of difficulty, particularly at the high one.

论文关键词:Knowledge discovery,Natural language processing,Text mining

论文评审过程:Received 1 September 2014, Accepted 26 August 2015, Available online 5 September 2015, Version of Record 10 November 2015.

论文官网地址:https://doi.org/10.1016/j.datak.2015.08.002