Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 2 of 2
  • Item
    Analysis of Speaker Recognition in Blended Emotional Environment Using Deep Learning Approaches
    (Springer Science and Business Media Deutschland GmbH, 2023) Tomar, S.; Koolagudi, S.G.
    Generally, human conversation has some emotion, and natural emotions are often blended. Today’s Speaker Recognition systems lack the component of emotion. This work proposes a Speaker Recognition approaches in Blended Emotion Environment (SRBEE) system to enhance Speaker Recognition (SR) in an emotional context. Speaker Recognition algorithms nearly always achieve perfect performance in the case of neutral speech, but it is not true from an emotional perspective. This work attempts the recognition of speakers in blended emotion with the Mel-Frequency Cepstral Coefficients (MFCC) feature extraction using the Conv2D classifier. In the blended emotional environment, calculating the accuracy of the Speaker Recognition task is complex. The blend of four basic natural emotions (happy, sad, angry, and fearful) utterances tested in the proposed system to reduce SR’s complexity in a blended emotional environment. The proposed system achieves an average accuracy of 99.3% for blended emotion with neutral speech and 92.8% for four basic blended natural emotions (happy, sad, angry, and fearful). The dataset was prepared by blending two emotions in one utterance. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.
  • Item
    Blended-emotional speech for Speaker Recognition by using the fusion of Mel-CQT spectrograms feature extraction
    (Elsevier Ltd, 2025) Tomar, S.; Koolagudi, S.G.
    Emotions are integral to human speech, adding depth and influencing the effectiveness of interactions. Speech with a single emotion is speech in which the emotional state stays the same throughout the utterance. Unlike single emotion, blended emotion involves a mix of emotions, such as happiness tinged with sadness or a shift from neutral to sadness within the same utterance. In real-life scenarios, people often experience and express mixed emotions. Most existing works on Speaker Recognition (SR), which recognizes the person from their voice, have focused on either neutral emotions or some primary emotions. This study aims to develop Blended-Emotional Speaker Recognition (BESR). In the proposed work, we try to look for emotional information in speech signals by simulating a blended emotional speech dataset for Speaker Recognition. The fusion of the Mel-Spectrograms and the Constant-Q Transform Spectrograms (Mel-CQT Spectrograms) has been developed to extract features. Three datasets, namely the National Institute of Technology Karnataka Kannada Language Emotional Speech Corpus (NITK-KLESC), the Crowd-sourced emotional multimodal actors dataset (CREMA-D), and the Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC) datasets are considered for the proposed work. The experimental outcomes demonstrate that the performance of the BESR system using blended emotional speech improves the fairness of Speaker Recognition. © 2025 Elsevier Ltd