Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
Search Results
Item CNN-MFCC Model for Speaker Recognition using Emotive Speech(Institute of Electrical and Electronics Engineers Inc., 2023) Tomar, S.; Koolagudi, S.G.Finding the appropriate speaker using voice recognition is called "speaker recognition."Emotive Environment Speaker Recognition (EESR) identifies speakers using distinct emotional speech. A real-life situation that becomes a requirement for many applications is speaker recognition, which utilizes various moods. If there is no emotion in the conversation, speaker recognition algorithms work almost flawlessly. This work aims to improve the accuracy of text-dependent and emotional speaker recognition system in emotional speech contexts. The proposed method is developed using Mel-Frequency Cepstral Coefficient (MFCC) feature and the classifier considered is Convolutional Neural Networks (CNN) for various emotions. The suggested system's performance is assessed based on emotional datasets from the Kannada Language and Emotional Database (EmoDB). These emotions are present in both datasets: happy, sad, angry, fear, and neutral. Due to the complexity of emotions, speaker recognition in various emotional states is challenging. The proposed system offers an accuracy of 96.2% in the EmoDB and 97.8% in the Kannada dataset. The proposed method provides a high recognition rate for different emotions. © 2023 IEEE.Item NITK-KLESC: Kannada Language Emotional Speech Corpus for Speaker Recognition(Institute of Electrical and Electronics Engineers Inc., 2023) Tomar, S.; Gupta, P.; Koolagudi, S.G.This work introduces an emotional speech dataset for Speaker Recognition (SR) task. The proposed dataset is recorded in the Kannada language from the people of Karnataka state of India. The speech dataset is collected by simulating five different emotions, such as Fear, Sad, Anger, Happy, and Neutral. The dataset is named as National Institute of Technology Karnataka, India- Kannada Language Emotional Speech Corpus (NITK-KLESC). The proposed dataset will be useful for SR tasks in various emotions. The proposed emotional speech dataset will be useful for emotion recognition, analysis of emotional speech, speech recognition, gender identification, and age identification of the age group 20 to 50 years. The proposed work describes the development, processing, analysis, acquisition, and evaluation of the proposed emotional speech dataset (NITK-KLESC). The analysis of emotional speech was done by considering various basic speech parameters like Pitch, Tempo, Intensity, and Zero Crossing Rate (ZCR). The characteristics of the dataset are reported using MFCC feature extraction and considered the CNN model as a classifier, compared with the existing EmoDB dataset. The average accuracy of the Emotional Speech Speaker Recognition (ESSR) task was measured at 84.44% with the EmoDB dataset and 95.2% with the proposed NITK-KLESC dataset. © 2023 IEEE.
