Polyphonic sound event detection using transposed convolutional recurrent neural network
No Thumbnail Available
Date
2020
Authors
Chatterjee C.C.
Mulimani M.
Koolagudi S.G.
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this paper we propose a Transposed Convolutional Recurrent Neural Network (TCRNN) architecture for polyphonic sound event recognition. Transposed convolution layer, which caries out a regular convolution operation but reverts the spatial transformation and it is combined with a bidirectional Recurrent Neural Network (RNN) to get TCRNN. Instead of the traditional mel spectrogram features, the proposed methodology incorporates mel-IFgram (Instantaneous Frequency spectrogram) features. The performance of the proposed approach is evaluated on sound events of publicly available TUT-SED 2016 and Joint sound scene and polyphonic sound event recognition datasets. Results show that the proposed approach outperforms state-of-the-art methods. © 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
Description
Keywords
Citation
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , Vol. 2020-May , , p. 661 - 665