Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 5 of 5
  • Item
    Speaker Recognition in Emotional Environment using Excitation Features
    (Institute of Electrical and Electronics Engineers Inc., 2020) Thomas, T.; Spoorthy; Sobhana, N.V.; Koolagudi, S.G.
    Speaker Recognition is known as the task of recognizing the person speaking from his/her speech. Speaker recognition has many applications including transaction authentication, access control, voice dialing, web services, etc. Emotive speaker recognition is important because in real life, human beings extensively express emotions during conversations, and emotions alter the human voice. A text-independent speaker recognition system is proposed in the work. The system designed is for emotional environment. The proposed system in this work is trained using the speech samples recorded in neutral environment and the system evaluation is performed in an emotional environment. Here, excitation source features are used to represent speaker-specific details contained in speech signal. The excitation source signal is obtained after separating the segmental level features from the voice samples. The excitation source signal is almost considered as a noise so identifying a speaker in an emotive environment is a challenging task. Excitation features include Linear Prediction (LP) residual, Glottal Closure Instance (GCI), LP residual phase, residual cepstrum, Residual Mel-Frequency Cepstral Coefficient (R-MFCC), etc. A decrease in performance is observed when the system is trained with neutral speech samples and tested with emotional speech samples. Different emotions considered for emotional speaker identification are happy, sad, anger, fear, neutral, surprise, disgust, and sarcastic For the classification of speakers the algorithms used are Gaussian Mixture Model (GMM), Support Vector Machine (SVM), K-Nearest Neighbor(KNN), Random Forest and Naive Bayes. © 2020 IEEE.
  • Item
    CNN-MFCC Model for Speaker Recognition using Emotive Speech
    (Institute of Electrical and Electronics Engineers Inc., 2023) Tomar, S.; Koolagudi, S.G.
    Finding the appropriate speaker using voice recognition is called "speaker recognition."Emotive Environment Speaker Recognition (EESR) identifies speakers using distinct emotional speech. A real-life situation that becomes a requirement for many applications is speaker recognition, which utilizes various moods. If there is no emotion in the conversation, speaker recognition algorithms work almost flawlessly. This work aims to improve the accuracy of text-dependent and emotional speaker recognition system in emotional speech contexts. The proposed method is developed using Mel-Frequency Cepstral Coefficient (MFCC) feature and the classifier considered is Convolutional Neural Networks (CNN) for various emotions. The suggested system's performance is assessed based on emotional datasets from the Kannada Language and Emotional Database (EmoDB). These emotions are present in both datasets: happy, sad, angry, fear, and neutral. Due to the complexity of emotions, speaker recognition in various emotional states is challenging. The proposed system offers an accuracy of 96.2% in the EmoDB and 97.8% in the Kannada dataset. The proposed method provides a high recognition rate for different emotions. © 2023 IEEE.
  • Item
    Analysis of Speaker Recognition in Blended Emotional Environment Using Deep Learning Approaches
    (Springer Science and Business Media Deutschland GmbH, 2023) Tomar, S.; Koolagudi, S.G.
    Generally, human conversation has some emotion, and natural emotions are often blended. Today’s Speaker Recognition systems lack the component of emotion. This work proposes a Speaker Recognition approaches in Blended Emotion Environment (SRBEE) system to enhance Speaker Recognition (SR) in an emotional context. Speaker Recognition algorithms nearly always achieve perfect performance in the case of neutral speech, but it is not true from an emotional perspective. This work attempts the recognition of speakers in blended emotion with the Mel-Frequency Cepstral Coefficients (MFCC) feature extraction using the Conv2D classifier. In the blended emotional environment, calculating the accuracy of the Speaker Recognition task is complex. The blend of four basic natural emotions (happy, sad, angry, and fearful) utterances tested in the proposed system to reduce SR’s complexity in a blended emotional environment. The proposed system achieves an average accuracy of 99.3% for blended emotion with neutral speech and 92.8% for four basic blended natural emotions (happy, sad, angry, and fearful). The dataset was prepared by blending two emotions in one utterance. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.
  • Item
    NITK-KLESC: Kannada Language Emotional Speech Corpus for Speaker Recognition
    (Institute of Electrical and Electronics Engineers Inc., 2023) Tomar, S.; Gupta, P.; Koolagudi, S.G.
    This work introduces an emotional speech dataset for Speaker Recognition (SR) task. The proposed dataset is recorded in the Kannada language from the people of Karnataka state of India. The speech dataset is collected by simulating five different emotions, such as Fear, Sad, Anger, Happy, and Neutral. The dataset is named as National Institute of Technology Karnataka, India- Kannada Language Emotional Speech Corpus (NITK-KLESC). The proposed dataset will be useful for SR tasks in various emotions. The proposed emotional speech dataset will be useful for emotion recognition, analysis of emotional speech, speech recognition, gender identification, and age identification of the age group 20 to 50 years. The proposed work describes the development, processing, analysis, acquisition, and evaluation of the proposed emotional speech dataset (NITK-KLESC). The analysis of emotional speech was done by considering various basic speech parameters like Pitch, Tempo, Intensity, and Zero Crossing Rate (ZCR). The characteristics of the dataset are reported using MFCC feature extraction and considered the CNN model as a classifier, compared with the existing EmoDB dataset. The average accuracy of the Emotional Speech Speaker Recognition (ESSR) task was measured at 84.44% with the EmoDB dataset and 95.2% with the proposed NITK-KLESC dataset. © 2023 IEEE.
  • Item
    NITK-TIEKLS: A Text-Independent Emotional Kannada Language Speech Dataset for Speaker Recognition
    (Springer Science and Business Media Deutschland GmbH, 2025) Tomar, S.; Koolagudi, S.G.
    Speaker recognition systems have traditionally relied on the consistency of speech content to identify individuals. However, text-independent speaker recognition, irrespective of the spoken content, presents a more flexible and robust alternative, especially in real-world scenarios. This research focuses on enhancing text-independent speaker recognition by incorporating a dataset for the Speaker Recognition (SR) task. The dataset is named the National Institute of Technology Karnataka - Text-Independent Emotional Kannada Language Speech (NITK-TIEKLS) dataset. The 200 natives of the Karnataka state of India have recorded emotional speech in the Kannada language for the proposed dataset. The neutral text-independent speech consists of a 4-min speech duration for each speaker. The two emotional speech utterances, from any two of the emotions anger, happiness, sadness, and fear, are text-independent speech utterances that consist of 2 min. The total duration is approximately 30 h. The proposed study includes developing, processing, analyzing, acquiring, and evaluating the proposed dataset. The suggested dataset consists of performance evaluations of the SR system through deep learning techniques with the proposed Wavelet-Mel Spectrogram. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.