CNN-MFCC Model for Speaker Recognition using Emotive Speech
No Thumbnail Available
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc.
Abstract
Finding the appropriate speaker using voice recognition is called "speaker recognition."Emotive Environment Speaker Recognition (EESR) identifies speakers using distinct emotional speech. A real-life situation that becomes a requirement for many applications is speaker recognition, which utilizes various moods. If there is no emotion in the conversation, speaker recognition algorithms work almost flawlessly. This work aims to improve the accuracy of text-dependent and emotional speaker recognition system in emotional speech contexts. The proposed method is developed using Mel-Frequency Cepstral Coefficient (MFCC) feature and the classifier considered is Convolutional Neural Networks (CNN) for various emotions. The suggested system's performance is assessed based on emotional datasets from the Kannada Language and Emotional Database (EmoDB). These emotions are present in both datasets: happy, sad, angry, fear, and neutral. Due to the complexity of emotions, speaker recognition in various emotional states is challenging. The proposed system offers an accuracy of 96.2% in the EmoDB and 97.8% in the Kannada dataset. The proposed method provides a high recognition rate for different emotions. © 2023 IEEE.
Description
Keywords
Convolutional Neural Network, Deep Neural Network, Mel Frequency Cepstral Coefficient, Speaker Recognition
Citation
2023 IEEE 8th International Conference for Convergence in Technology, I2CT 2023, 2023, Vol., , p. -
