Feature assisted stacked attentive shortest dependency path based Bi-LSTM model for protein–protein interaction

作者:

Highlights:

摘要

Knowledge about protein–protein interactions is essential for understanding the biological processes such as metabolic pathways, DNA replication, and transcription etc. However, a majority of the existing Protein–Protein Interaction (PPI) systems are dependent primarily on the scientific literature, which is not yet accessible as a structured database. Thus, efficient information extraction systems are required for identifying PPI information from the large collection of biomedical texts.In this paper, we present a novel method based on attentive deep recurrent neural network, which combines multiple levels of representations exploiting word sequences and dependency path related information to identify protein–protein interaction (PPI) information from the text. We use the stacked attentive bi-directional long short term memory (Bi-LSTM) as our recurrent neural network to solve the PPI identification problem. This model leverages joint modeling of proteins and relations in a single unified framework, which is named as the ‘Attentive Shortest Dependency Path LSTM’ (Att-sdpLSTM) model. Experimentation of the proposed technique was conducted on five popular benchmark PPI datasets, namely AiMed, BioInfer, HPRD50, IEPA, and LLL. The evaluation shows the F1-score values of 93.29%, 81.68%, 78.73%, 76.25%, & 83.92% on AiMed, BioInfer, HPRD50, IEPA, and LLL dataset, respectively. Comparisons with the existing systems show that our proposed approach attains state-of-the-art performance.

论文关键词:Relation extraction,Protein-protein interaction,Bi-directional long short term memory(Bi-LSTM),Stacked attention,Deep learning,Shortest dependency path,Support vector machine

论文评审过程:Received 9 August 2017, Revised 10 November 2018, Accepted 15 November 2018, Available online 18 December 2018, Version of Record 23 January 2019.

论文官网地址:https://doi.org/10.1016/j.knosys.2018.11.020