Multimodal fusion methods with deep neural networks and meta-information for aggression detection in surveillance

作者:

Highlights:

• A DNN-based approach is proposed for aggression detection in surveillance.

• Four multimodal fusion methods are used with and without an intermediate level.

• Acoustic, visual and textual features are combined with meta-features.

• Linguistic and word affect features are used with properties of spontaneous speech.

• The different methods are validated on the dataset of aggression in trains.

摘要

•A DNN-based approach is proposed for aggression detection in surveillance.•Four multimodal fusion methods are used with and without an intermediate level.•Acoustic, visual and textual features are combined with meta-features.•Linguistic and word affect features are used with properties of spontaneous speech.•The different methods are validated on the dataset of aggression in trains.

论文关键词:Aggression detection,Deep learning,Multimodal fusion,Audio–visual fusion,Text-based features,Meta-features

论文评审过程:Received 3 March 2021, Revised 6 July 2022, Accepted 10 August 2022, Available online 13 August 2022, Version of Record 26 August 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.118523