Audio classification using attention-augmented convolutional neural network

作者:

Highlights:

摘要

Audio classification, as a set of important and challenging tasks, groups speech signals according to speakers’ identities, accents, and emotional states. Due to the high dimensionality of the audio data, task-specific hand-crafted features extraction is always required and regarded cumbersome for various audio classification tasks. More importantly, the inherent relationship among features has not been fully exploited. In this paper, the original speech signal is first represented as spectrogram and later be split along the frequency domain to form frequency-distributed spectrogram. This paper proposes a task-independent model, called FreqCNN, to automaticly extract distinctive features from each frequency band by using convolutional kernels. Further more, an attention mechanism is introduced to systematically enhance the features from certain frequency bands. The proposed FreqCNN is evaluated on three publicly available speech databases thorough three independent classification tasks. The obtained results demonstrate superior performance over the state-of-the-art.

论文关键词:Audio classification,Spectrograms,Convolutional neural networks,Attention mechanism

论文评审过程:Received 8 March 2018, Revised 17 May 2018, Accepted 24 July 2018, Available online 26 July 2018, Version of Record 31 October 2018.

论文官网地址:https://doi.org/10.1016/j.knosys.2018.07.033