EnsembleWave: An ensembled approach for Automatic Speech Emotion Recognition

No Thumbnail Available

Date

2022

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers Inc.

Abstract

Accurate recognition of emotions from speech and understanding the determining factors behind the judgment can improve the machine's decision-making quality. Current state-of-the-art architectures have focused on either deep learning-based approaches or hand-engineered features. As a result, models fail in gathering complete contextual information and weak generalization across different datasets. This paper presents an end-to-end ensemble-based deep learning architecture that examines raw speech signals and classifies them into the four basic emotions - Sad, Angry, Happy, and Neutral. The proposed EnsembleWave architecture incorporates Attention Wavenet and hand-engineered feature extraction to assimilate a larger field-of-view and capture dataset independent characteristics. The model has achieved an overall accuracy of 98%, 85%, 74%, and 99%, on the four famous Speech Emotion Recognition (SER) datasets - EMO-DB, SAVEE, CREMA-D, and TESS, respectively, outperforming the state-of-the-art techniques both quantitatively and qualitatively. The proposed architecture can also learn the generalized categorization of emotions across different datasets. The python source code of the proposed model will be available at https://github.com/deepanshi-s/EnsembleWave © 2022 IEEE.

Description

Keywords

Ensembled Approach, Self Attention, Speech Emotion Recognition, Wavenet

Citation

2022 IEEE International Conference on Electronics, Computing and Communication Technologies, CONECCT 2022, 2022, Vol., , p. -

Endorsement

Review

Supplemented By

Referenced By