Multimodal channel-wise attention transformer inspired by multisensory integration mechanisms of the brain

作者:

Highlights:

• A novel multimodal integration block named multimodal channel-wise attention transformer (MCAT) is proposed inspired by multisensory integration mechanisms in the brain.

• The MCAT block improves the performance of early-fusion long short-term memory (EF-LSTM) neural networks on a fine-grained bird recognition task, and the performance of multimodal transformer (MulT) on an emotion recognition task, respectively.

• The proposed multi-head attention module is evaluated critical in integrating visual-auditory modalities effectively.

• The MCAT block is accordance with the principle of inverse effectiveness of multisensory integration in the brain.

摘要

•A novel multimodal integration block named multimodal channel-wise attention transformer (MCAT) is proposed inspired by multisensory integration mechanisms in the brain.•The MCAT block improves the performance of early-fusion long short-term memory (EF-LSTM) neural networks on a fine-grained bird recognition task, and the performance of multimodal transformer (MulT) on an emotion recognition task, respectively.•The proposed multi-head attention module is evaluated critical in integrating visual-auditory modalities effectively.•The MCAT block is accordance with the principle of inverse effectiveness of multisensory integration in the brain.

论文关键词:Multisensory integration,Top-down attention,Multimodal transformer,Fine-grained bird recognition,Emotion recognition

论文评审过程:Received 28 October 2020, Revised 30 May 2022, Accepted 6 June 2022, Available online 7 June 2022, Version of Record 10 June 2022.

论文官网地址:https://doi.org/10.1016/j.patcog.2022.108837