Sentence identification of biological interactions using PATRICIA tree generated patterns and genetic algorithm optimized parameters

作者:

Highlights:

摘要

An important task in information retrieval is to identify sentences that contain important relationships between key concepts. In this work, we propose a novel approach to automatically extract sentence patterns that contain interactions involving concepts of molecular biology. A pattern is defined in this work as a sequence of specialized Part-of-Speech (POS) tags that capture the structure of key sentences in the scientific literature. Each candidate sentence for the classification task is encoded as a POS array and then aligned to a collection of pre-extracted patterns. The quality of the alignment is expressed as a pairwise alignment score. The most innovative component of this work is the use of a genetic algorithm (GA) to maximize the classification performance of the alignment scoring scheme. The system achieves an average F-score of 0.796 in identifying sentences which describe interactions between co-occurring biological concepts. This performance is mostly affected by the quality of the preprocessing steps such as term identification and POS tagging.

论文关键词:Biological interactions patterns,PATRICIA tree,Pattern matching,Interaction sentences,Genetic algorithm

论文评审过程:Received 2 February 2009, Revised 14 September 2009, Accepted 14 September 2009, Available online 23 September 2009.

论文官网地址:https://doi.org/10.1016/j.datak.2009.09.002