Substituting clinical features using synthetic medical phrases: Medical text data augmentation techniques

作者:

Highlights:

• This study presents two new approaches to fully automatic medical notes augmentation based on an ontology and a dictionary.

• The SciName method is using UMLS for data augmentation by replacing expression in the documents with their scientific names.

• The SynName+SciName method makes documents by using SciName method plus using WordNet to replace words with their synonyms.

• he proposed methods improved the performance of CNN, RNN, and HAN models by providing more instances in the training stage.

摘要

•This study presents two new approaches to fully automatic medical notes augmentation based on an ontology and a dictionary.•The SciName method is using UMLS for data augmentation by replacing expression in the documents with their scientific names.•The SynName+SciName method makes documents by using SciName method plus using WordNet to replace words with their synonyms.•he proposed methods improved the performance of CNN, RNN, and HAN models by providing more instances in the training stage.

论文关键词:Unified Medical Language System,Natural language processing,Machine learning,Data augmentation,Medical document classification

论文评审过程:Received 15 April 2021, Revised 2 September 2021, Accepted 3 September 2021, Available online 10 September 2021, Version of Record 14 September 2021.

论文官网地址:https://doi.org/10.1016/j.artmed.2021.102167