Polyphonic sound event localization and detection using channel-wise FusionNet

dc.contributor.authorV, S.
dc.contributor.authorKooolagudi, S.G.
dc.date.accessioned2026-02-04T12:25:04Z
dc.date.issued2024
dc.description.abstractSound Event Localization and Detection (SELD) is the task of spatial and temporal localization of various sound events and their classification. Commonly, multitask models are used to perform SELD. In this work, a deep learning network model named channel-wise ‘FusionNet’ is designed to perform the SELD task. The novel fusion layer is introduced into the regular Deep Neural Network (DNN), where the input is fed channel-wise, and the outputs of all channels are fused to form a new feature representation. The key contribution of this work is the neural network model which helps to retain the channel-wise information from the multichannel input along with the spatial and temporal information. The proposed network utilizes separable convolution blocks in the convolution layers, therefore, the complexity of the model is low in terms of both time and space. The features used as input are Mel-band energies for Sound Event Detection (SED) and intensity vectors for the Direction-of-Arrival (DOA) estimation. The proposed network’s fusion layer provides a better representation of features for both SED and DOA estimation tasks. Experiments are performed on the recordings of the First-order Ambisonic (FOA) array format of the TAU-NIGENS Spatial Sound Events 2020 dataset. An improved performance is achieved in terms of Error Rate (ER), DOA error, and Frame Recall (FR) has been observed in comparison to the state-of-the-art SELD systems. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
dc.identifier.citationApplied Intelligence, 2024, 54, 6, pp. 5015-5026
dc.identifier.issn0924669X
dc.identifier.urihttps://doi.org/10.1007/s10489-024-05438-6
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/21239
dc.publisherSpringer
dc.subjectConvolution
dc.subjectDeep neural networks
dc.subjectLearning systems
dc.subjectMultilayer neural networks
dc.subjectNeural network models
dc.subjectDirection-of-arrival
dc.subjectDirectionof-arrival (DOA)
dc.subjectEvent localizations
dc.subjectEvents detection
dc.subjectFusion layers
dc.subjectFusionnet
dc.subjectPolyphonic sound event detection
dc.subjectPolyphonic sounds
dc.subjectSound event detection
dc.subjectSound event localization and detection
dc.subjectSound events
dc.subjectDirection of arrival
dc.titlePolyphonic sound event localization and detection using channel-wise FusionNet

Files

Collections