MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach

作者：

Highlights：

• A lightweight model using a one-dimensional CNN for real-time SER system is proposed.

• A multi-learning trick (MLT) is proposed for utilizing UFLBs, and stacked GRUs setup.

• Proposed model have peculiar ability to parallel learn spatial and temporal features.

• A 1D dilated CNN architecture is explored, in order to enhance the usage of features.

• We evaluated our model on benchmark corpora and improve the current baseline methods.

摘要

•A lightweight model using a one-dimensional CNN for real-time SER system is proposed.•A multi-learning trick (MLT) is proposed for utilizing UFLBs, and stacked GRUs setup.•Proposed model have peculiar ability to parallel learn spatial and temporal features.•A 1D dilated CNN architecture is explored, in order to enhance the usage of features.•We evaluated our model on benchmark corpora and improve the current baseline methods.

论文关键词：Affective computing,Dilated convolutional neural network,Real-time speech emotion recognition,Parallel learning,Multi-learning trick (MLT),And raw audio clips

论文评审过程：Received 10 July 2020, Revised 26 October 2020, Accepted 26 October 2020, Available online 31 October 2020, Version of Record 10 February 2021.

论文官网地址：https://doi.org/10.1016/j.eswa.2020.114177