Polyphonic sound event detection using transposed convolutional recurrent neural network

dc.contributor.authorChatterjee, C.C.
dc.contributor.authorMulimani, M.
dc.contributor.authorKoolagudi, S.G.
dc.date.accessioned2026-02-06T06:36:48Z
dc.date.issued2020
dc.description.abstractIn this paper we propose a Transposed Convolutional Recurrent Neural Network (TCRNN) architecture for polyphonic sound event recognition. Transposed convolution layer, which caries out a regular convolution operation but reverts the spatial transformation and it is combined with a bidirectional Recurrent Neural Network (RNN) to get TCRNN. Instead of the traditional mel spectrogram features, the proposed methodology incorporates mel-IFgram (Instantaneous Frequency spectrogram) features. The performance of the proposed approach is evaluated on sound events of publicly available TUT-SED 2016 and Joint sound scene and polyphonic sound event recognition datasets. Results show that the proposed approach outperforms state-of-the-art methods. © 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
dc.identifier.citationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2020, Vol.2020-May, , p. 661-665
dc.identifier.issn07367791; 15206149
dc.identifier.urihttps://doi.org/10.1109/ICASSP40776.2020.9054628
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/30696
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.subjectConvolution Neural Networks (CNN)
dc.subjectDeep Neural Networks (DNN)
dc.subjectInstantaneous Frequency spectrogram (IFgram)
dc.subjectRecurrent Neural Networks (RNN)
dc.subjectSound Event Detection (SED)
dc.subjectTransposed CNN (TCNN)
dc.titlePolyphonic sound event detection using transposed convolutional recurrent neural network

Files