Preliminary exploration of topic modelling representations for Electronic Health Records coding according to the International Classification of Diseases in Spanish

作者:

Highlights:

• Classification of Electronic Health Records in Spanish and English using XAI methods.

• Interpretable representation of the text as a fixed-length numerical vector.

• PLDA can discover topics associated with the ICD without a classifier.

• Experiments in Spanish and English to seize topic-coherence and ICD association.

摘要

•Classification of Electronic Health Records in Spanish and English using XAI methods.•Interpretable representation of the text as a fixed-length numerical vector.•PLDA can discover topics associated with the ICD without a classifier.•Experiments in Spanish and English to seize topic-coherence and ICD association.

论文关键词:Multi-label classification,Document classification,Electronic Health Records,ICD classification,Topic models,Partially labelled dirichlet allocation

论文评审过程:Received 30 April 2021, Revised 14 March 2022, Accepted 22 April 2022, Available online 6 May 2022, Version of Record 9 June 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.117303