A hybrid model for finding abbreviation–definition pairs from biomedical abstracts using heuristics-based sequence labeling and perceptron linear classifier
作者:
Highlights:
• A hybrid model is introduced for extracting acronym-definition pairs from biomedical text.
• Heuristics-based sequence labeling is introduced for pattern recognition task.
• Three-level mapping strategies are proposed in sequence labeling task.
• Valid abbreviation-definition pair is recognized through perceptron linear classifier.
• Recently published PubMed abstracts are utilized from Thalia (Semantic Search Engine).
摘要
•A hybrid model is introduced for extracting acronym-definition pairs from biomedical text.•Heuristics-based sequence labeling is introduced for pattern recognition task.•Three-level mapping strategies are proposed in sequence labeling task.•Valid abbreviation-definition pair is recognized through perceptron linear classifier.•Recently published PubMed abstracts are utilized from Thalia (Semantic Search Engine).
论文关键词:Biomedical abbreviation-definition extraction,Text mining,Heuristics approach,Pattern recognition,Sequence labeling,Perceptron learning
论文评审过程:Received 27 March 2020, Revised 12 September 2020, Accepted 23 September 2020, Available online 28 September 2020, Version of Record 9 October 2020.
论文官网地址:https://doi.org/10.1016/j.eswa.2020.114049