Graph-based multi-label disease prediction model learning from medical data and domain knowledge

作者:

Highlights:

摘要

In recent years, the means of disease diagnosis and treatment have been improved remarkably, along with the continuous development of technology and science. Researchers have spent tremendous time and effort to build models that aim to assist medical practitioners in decision-making support. However, one of the greatest challenges remains how to identify the connection between different diseases. This study aims to discover the relationship between diseases and symptoms to predict potential diseases for patients. Considering it a multi-label classification problem, the study proposed a new multi-disease prediction model learning from NHANES, an extensive health related dataset, and MEDLINE, a corpus with medical domain knowledge. A heterogeneous information graph is firstly constructed and then populated using medical domain knowledge discovered from MEDLINE. The knowledge graph is analysed for clarification of the relevancy within nodes in positive or negative space, helping to access to the correlation amongst multiple diseases and their symptoms. A multi-label disease prediction model is then developed adopting the medical domain knowledge graph. Empirical experiments are conducted to evaluate the proposed model. The experimental results show that the performance of the proposed model surpassed state-of-the-art related works representing the mainstreams of multi-label classification. This study contributes to the medical community with a novel model for multi-disease prediction and represents a new endeavour on multi-label classification using knowledge graphs.

论文关键词:Multi-label classification,Knowledge graph,Medicine domain knowledge,Disease prediction,NHANES,MEDLINE

论文评审过程:Received 28 September 2020, Revised 24 September 2021, Accepted 27 October 2021, Available online 2 November 2021, Version of Record 11 November 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107662