Noise Cancellation by Fast Fourier Transform for Wav2Vec2.0 based Speech-to-Text System

dc.contributor.authorGupta, S.P.
dc.contributor.authorSpoorthy, V.
dc.contributor.authorKoolagudi, S.G.
dc.date.accessioned2026-02-06T06:34:57Z
dc.date.issued2023
dc.description.abstractSpeech-To-Text (STT) systems are a part of the Speech Recognition domain in which speech is given as input, and it generates the transcript. The input speech sometimes disrupts the STT system and generates incorrect transcripts because of background noise. In this work, we have discussed a Fast Fourier Transform (FFT) based noise cancellation method for Hindi words with background noise and performed speech to text conversion using a fine-tuned and pre-trained Wav2Vec2.0 model. The background noise added to the audio samples is Gaussian white noise with three different intensity levels, 0.01, 0.03, and 0.05 units, indicated by the Gaussian distribution's standard deviation (STD). The model has been trained on the OpenSLR Hindi dataset. The proposed system is evaluated by the metric Character Error Rate (CER). The testing of the model is done using 20 Hindi words in both clean and noisy conditions. The results obtained proved that the noise cancellation was found effective in terms of CER, and on first level noise with an STD of 0.01, the CER is better after noise cancellation than its noisy counterpart. © 2023 IEEE.
dc.identifier.citation2023 IEEE 8th International Conference for Convergence in Technology, I2CT 2023, 2023, Vol., , p. -
dc.identifier.urihttps://doi.org/10.1109/I2CT57861.2023.10126221
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/29559
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.subjectFast Fourier Transform
dc.subjectNoise Cancellation
dc.subjectOpenSLR speech corpus
dc.subjectSpeech-To-Text
dc.titleNoise Cancellation by Fast Fourier Transform for Wav2Vec2.0 based Speech-to-Text System

Files