Preliminary exploration of topic modelling representations for Electronic Health Records coding according to the International Classification of Diseases in Spanish
作者:
Highlights:
• Classification of Electronic Health Records in Spanish and English using XAI methods.
• Interpretable representation of the text as a fixed-length numerical vector.
• PLDA can discover topics associated with the ICD without a classifier.
• Experiments in Spanish and English to seize topic-coherence and ICD association.
摘要
•Classification of Electronic Health Records in Spanish and English using XAI methods.•Interpretable representation of the text as a fixed-length numerical vector.•PLDA can discover topics associated with the ICD without a classifier.•Experiments in Spanish and English to seize topic-coherence and ICD association.
论文关键词:Multi-label classification,Document classification,Electronic Health Records,ICD classification,Topic models,Partially labelled dirichlet allocation
论文评审过程:Received 30 April 2021, Revised 14 March 2022, Accepted 22 April 2022, Available online 6 May 2022, Version of Record 9 June 2022.
论文官网地址:https://doi.org/10.1016/j.eswa.2022.117303