Polyphonic sound event detection using transposed convolutional recurrent neural network
No Thumbnail Available
Date
2020
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc.
Abstract
In this paper we propose a Transposed Convolutional Recurrent Neural Network (TCRNN) architecture for polyphonic sound event recognition. Transposed convolution layer, which caries out a regular convolution operation but reverts the spatial transformation and it is combined with a bidirectional Recurrent Neural Network (RNN) to get TCRNN. Instead of the traditional mel spectrogram features, the proposed methodology incorporates mel-IFgram (Instantaneous Frequency spectrogram) features. The performance of the proposed approach is evaluated on sound events of publicly available TUT-SED 2016 and Joint sound scene and polyphonic sound event recognition datasets. Results show that the proposed approach outperforms state-of-the-art methods. © 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
Description
Keywords
Convolution Neural Networks (CNN), Deep Neural Networks (DNN), Instantaneous Frequency spectrogram (IFgram), Recurrent Neural Networks (RNN), Sound Event Detection (SED), Transposed CNN (TCNN)
Citation
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2020, Vol.2020-May, , p. 661-665
