Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 2 of 2

Identification ofÂ Speaker-Specific Features toÂ Minimize theÂ Mismatch Outcomes forÂ Speaker Recognition Using Anger andÂ Happy Emotional Speech
(Springer Science and Business Media Deutschland GmbH, 2025) Tomar, S.; Koolagudi, S.G.
A vital component of digital speech processing is Speaker Recognition (SR). However, variation in speakersâ€™ emotional states, such as happiness, anger, sadness, or fear, poses a significant challenge that compromises the robustness of speaker recognition systems. It appears to be challenging to distinguish between emotions like â€œangerâ€ and â€œhappyâ€ , according to research on SRÂ using emotive speech. The study looks at prosody-related speech characteristics to determine how to distinguish between â€œangerâ€ Â and â€œhappyâ€ emotional speech for SR tasks. The goal is to explore speaker-specific features. The experiment outcomes demonstrate that, as speaker-specific features for the SR task, Intensity, Pitch,Â and Brightness (IPB) variables can distinguish between angry andÂ happy emotional speech. Combining IPB and MFCC (IPBCC) feature extraction with the Hybrid CNN-LSTM combined with an attention mechanism approach achieves an SR accuracy of 95.45% for anger and 96.22% for happy emotional speech. Â© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
Transformation ofÂ Emotional Speech toÂ Anger Speech toÂ Reduce Mismatches inÂ Testing andÂ Enrollment Speech forÂ Speaker Recognition System
(Springer Science and Business Media Deutschland GmbH, 2025) Tomar, S.; Koolagudi, S.G.
Speaker Recognition (SR) is a critical component of digital speech processing. The robustness of Speaker Recognition systems is compromised by the variance in speakersâ€™ emotional states. According to a study on SR utilizing emotive speech, it seems complicated to distinguish between emotions like â€œanger,â€ â€œsad,â€ â€œfear,â€ and â€œhappyâ€ . Developing a speaker recognition model that works effectively using emotional speech is challenging, specifically in the case of some intense emotions like anger. This work explores emotional speech transformation approaches to reduce the mismatch between training and testing emotional speech for the SR tasks. The recommended effort aims to develop speech transformation techniques to transform different emotional speech into anger. This study modifies the prosodic features â€œTPIBâ€ (Tempo, Pitch, Intensity, and Brightness) to transform the speech from neutral, happy, fearful, and sad emotions to anger. Performance evaluations of the SR system employing transformed emotional speech are obtained through integrating Mel-Spectrogram feature extraction and deep learning techniques, including the CREMA-D and NITK-KLESC datasets. The experiment results demonstrate that the suggested emotional speech transformation technique increases SR accuracy in transforming neutral by approximately 15%, happy by 11%, sad by 32%, and fear by 30%. Â© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results