Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods

作者:

Highlights:

• Five vocabulary- and model-based methods to extract terms from scientific publications are evaluated.

• Three conditional random fields (CRF)-based methods outperform the two vocabulary-based ones.

• CRF with keyword-based dictionary method has the best performance.

• The keyword-based one has a higher recall and the Wikipedia-based one has a higher precision.

摘要

•Five vocabulary- and model-based methods to extract terms from scientific publications are evaluated.•Three conditional random fields (CRF)-based methods outperform the two vocabulary-based ones.•CRF with keyword-based dictionary method has the best performance.•The keyword-based one has a higher recall and the Wikipedia-based one has a higher precision.

论文关键词:Entity extraction,Vocabulary,Dictionary,Conditional random fields,Content-aware

论文评审过程:Received 5 November 2014, Revised 22 April 2015, Accepted 22 April 2015, Available online 16 May 2015, Version of Record 16 May 2015.

论文官网地址:https://doi.org/10.1016/j.joi.2015.04.003