Identifying humanitarian information for emergency response by modeling the correlation and independence between text and images

Highlights：

• A multimodal humanitarian information identification task that can adapt to label inconsistency caused by modality independence and maintain modality correlation is defined.

• A multimodal humanitarian information identification model CIMHIM that simultaneously captures the correlation and independence between modalities is proposed.

• Experiments on multiple datasets demonstrate the superiority of CIMHIM over the modality-correlated and modality-independent models.

• A manually annotated dataset tailored for the task is constructed.

摘要

•A multimodal humanitarian information identification task that can adapt to label inconsistency caused by modality independence and maintain modality correlation is defined.•A multimodal humanitarian information identification model CIMHIM that simultaneously captures the correlation and independence between modalities is proposed.•Experiments on multiple datasets demonstrate the superiority of CIMHIM over the modality-correlated and modality-independent models.•A manually annotated dataset tailored for the task is constructed.

论文关键词：Humanitarian information,Emergency response,Multimodal deep learning,Text-image relationships,Multimodal fusion

论文评审过程：Received 2 December 2021, Revised 30 April 2022, Accepted 6 May 2022, Available online 13 May 2022, Version of Record 13 May 2022.