Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 2 of 2
  • Item
    EnsembleWave: An ensembled approach for Automatic Speech Emotion Recognition
    (Institute of Electrical and Electronics Engineers Inc., 2022) Barkur, R.; Deepansh; I Suresh, D.; Mahesh Kumar, T.N.; Narasimhadhan, A.V.
    Accurate recognition of emotions from speech and understanding the determining factors behind the judgment can improve the machine's decision-making quality. Current state-of-the-art architectures have focused on either deep learning-based approaches or hand-engineered features. As a result, models fail in gathering complete contextual information and weak generalization across different datasets. This paper presents an end-to-end ensemble-based deep learning architecture that examines raw speech signals and classifies them into the four basic emotions - Sad, Angry, Happy, and Neutral. The proposed EnsembleWave architecture incorporates Attention Wavenet and hand-engineered feature extraction to assimilate a larger field-of-view and capture dataset independent characteristics. The model has achieved an overall accuracy of 98%, 85%, 74%, and 99%, on the four famous Speech Emotion Recognition (SER) datasets - EMO-DB, SAVEE, CREMA-D, and TESS, respectively, outperforming the state-of-the-art techniques both quantitatively and qualitatively. The proposed architecture can also learn the generalized categorization of emotions across different datasets. The python source code of the proposed model will be available at https://github.com/deepanshi-s/EnsembleWave © 2022 IEEE.
  • Item
    An Improved Method for Speech Enhancement Using Convolutional Neural Network Approach
    (Institute of Electrical and Electronics Engineers Inc., 2022) Mahesh Kumar, T.N.; Hegde, P.; Deepak, K.T.; Narasimhadhan, A.V.
    In the speech processing domain Speech enhancement is one of the most widely used techniques. With the development of deep neural networks and the availability of powerful hardware, multiple deep learning-based speech enhancement models have come up in recent years. In this work, the speech enhancement technique using a Convolutional Neural Network(CNN) as Denoising Autoencoders (DAEs) is investigated and compared with the conventional feed-forward topology. Further, The proposed model is analyzed at various SNR levels to process the corrupted english speech and also tested on unseen speech data which includes additional SNR levels. It is observed from simulation results that the proposed model outperforms the existing model in terms of Perceptual Evaluation of Speech Quality (PESQ) and Log Spectral Distance (LSD). The network achieved 3% higher scores than feed-forward neural networks, and it is found that the convolutional DAEs perform better than feed-forward counterparts. © 2022 IEEE.