Bagged support vector machines for emotion recognition from speech

作者:

Highlights:

摘要

Speech emotion recognition, a highly promising and exciting problem in the field of Human Computer Interaction, has been studied and analyzed over several decades. It concerns the task of recognizing a speaker’s emotions from their speech recordings. Recognizing emotions from speech can go a long way in determining a person’s physical and psychological state of well-being. In this work we performed emotion classification on three corpora — the Berlin EmoDB, the Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC), and the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). A combination of spectral features was extracted from them which was further processed and reduced to the required feature set. Ensemble learning has been proven to give superior performance compared to single estimators. We propose a bagged ensemble comprising of support vector machines with a Gaussian kernel as a viable algorithm for the problem at hand. We report the results obtained on the three datasets mentioned above.

论文关键词:Speech emotion recognition,Machine learning,Ensemble learning

论文评审过程:Received 24 December 2018, Revised 14 June 2019, Accepted 27 July 2019, Available online 2 August 2019, Version of Record 11 October 2019.

论文官网地址:https://doi.org/10.1016/j.knosys.2019.104886