CNN-MFCC Model for Speaker Recognition using Emotive Speech

dc.contributor.authorTomar, S.
dc.contributor.authorKoolagudi, S.G.
dc.date.accessioned2026-02-06T06:34:57Z
dc.date.issued2023
dc.description.abstractFinding the appropriate speaker using voice recognition is called "speaker recognition."Emotive Environment Speaker Recognition (EESR) identifies speakers using distinct emotional speech. A real-life situation that becomes a requirement for many applications is speaker recognition, which utilizes various moods. If there is no emotion in the conversation, speaker recognition algorithms work almost flawlessly. This work aims to improve the accuracy of text-dependent and emotional speaker recognition system in emotional speech contexts. The proposed method is developed using Mel-Frequency Cepstral Coefficient (MFCC) feature and the classifier considered is Convolutional Neural Networks (CNN) for various emotions. The suggested system's performance is assessed based on emotional datasets from the Kannada Language and Emotional Database (EmoDB). These emotions are present in both datasets: happy, sad, angry, fear, and neutral. Due to the complexity of emotions, speaker recognition in various emotional states is challenging. The proposed system offers an accuracy of 96.2% in the EmoDB and 97.8% in the Kannada dataset. The proposed method provides a high recognition rate for different emotions. © 2023 IEEE.
dc.identifier.citation2023 IEEE 8th International Conference for Convergence in Technology, I2CT 2023, 2023, Vol., , p. -
dc.identifier.urihttps://doi.org/10.1109/I2CT57861.2023.10126402
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/29556
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.subjectConvolutional Neural Network
dc.subjectDeep Neural Network
dc.subjectMel Frequency Cepstral Coefficient
dc.subjectSpeaker Recognition
dc.titleCNN-MFCC Model for Speaker Recognition using Emotive Speech

Files