Polyphonic Sound Event Detection Using Mel-Pseudo Constant Q-Transform and Deep Neural Network

dc.contributor.authorSpoorthy, V.
dc.contributor.authorKoolagudi, S.G.
dc.date.accessioned2026-02-04T12:25:42Z
dc.date.issued2024
dc.description.abstractThe task of identification of sound events in a particular surrounding is known as Sound Event Detection (SED) or Acoustic Event Detection (AED). The occurrence of sound events is unstructured and also displays wide variations in both temporal structure and frequency content. Sound events may be non-overlapped (monophonic) or overlapped (polyphonic) in nature. In real-time scenarios, polyphonic SED is most commonly seen as compared to monophonic SED. In this paper, a Mel-Pseudo Constant Q-Transform (MP-CQT) technique is introduced to perform polyphonic SED to effectively learn both monophonic and polyphonic sound events. A pseudo CQT technique is adapted to extract features from the audio files and their Mel spectrograms. The Mel-scale is believed to broadly simulate human perception system. The classifier used is a Convolutional Recurrent Neural Network (CRNN). Comparison of the performance of the proposed MP-CQT technique along with CRNN is presented and a considerable performance improvement is observed. The proposed method achieved an average error rate of 0.684 and average F1 score of 52.3%. The proposed approach is also analyzed for the robustness by adding an additional noise at different Signal to Noise Ratios (SNRs) to the audio files. The proposed method for SED task has displayed improved performance as compared to state-of-the-art SED systems. The introduction of new feature extraction technique has shown promising improvement in the performance of the polyphonic SED system. © 2024 IETE.
dc.identifier.citationIETE Journal of Research, 2024, 70, 5, pp. 5031-5043
dc.identifier.issn3772063
dc.identifier.urihttps://doi.org/10.1080/03772063.2023.2253768
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/21494
dc.publisherTaylor and Francis Ltd.
dc.subjectAcoustic noise
dc.subjectAudio acoustics
dc.subjectConvolution
dc.subjectConvolutional neural networks
dc.subjectRecurrent neural networks
dc.subjectSignal to noise ratio
dc.subjectAcoustic event detection
dc.subjectAcoustic event detections
dc.subjectComputational auditory scene analyse
dc.subjectComputational auditory scene analysis
dc.subjectConvolutional recurrent neural network
dc.subjectDeep neural network
dc.subjectPolyphonic sound event detection
dc.subjectPolyphonic sounds
dc.subjectPseudo-constant Q- transform
dc.subjectSound event detection
dc.subjectDeep neural networks
dc.titlePolyphonic Sound Event Detection Using Mel-Pseudo Constant Q-Transform and Deep Neural Network

Files

Collections