Rare sound event detection using superlets and a convolutional TDPANet

Pandey, G.; Koolagudi, S.G.

Rare sound event detection using superlets and a convolutional TDPANet

dc.contributor.author	Pandey, G.
dc.contributor.author	Koolagudi, S.G.
dc.date.accessioned	2026-02-03T13:19:21Z
dc.date.issued	2025
dc.description.abstract	Rare Sound Event Detection (RSED) focuses on identifying infrequent but significant sound events in audio recordings with precise onset and offset times. It is crucial for applications like surveillance, healthcare, and environmental monitoring. An essential component in RSED systems is extracting effective time-frequency representation as input features. These features capture short, transient acoustic events in an audio input recording, even in noisy and complex environments. Most existing approaches to this RSED problem rely on input features as time-frequency representations, such as the Mel spectrogram, Constant-Q Transform (CQT), and Continuous Wavelet Transform (CWT). However, these approaches often suffer from resolution trade-offs between frequency and time. This trade-off limits their ability to precisely capture the fine-grained details needed to detect these events in complex acoustic environments. To overcome these limitations, we introduce superlets, a novel time-frequency representation that offers super-resolution in both time and frequency domains. To process the high-resolution Superlet features, we have also proposed a Convolutional Temporal Dilated Pyramid Attention Network (TDPANet). This novel neural network architecture incorporates convolutional feature extraction, dilated temporal modeling, multi-scale temporal pooling, and temporal attention mechanisms to enhance event detection accuracy. We evaluate our method on the DCASE 2017 Task 2 rare sound event dataset, which includes isolated sound events and real-world acoustic scenes. Experimental results show that our proposed method significantly outperforms state-of-the-art techniques, achieving an Error Rate (ER) of 0.15 and an F1-score of 92.3%, demonstrating its effectiveness in detecting rare sound events. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.
dc.identifier.citation	Signal, Image and Video Processing, 2025, 19, 10, pp. -
dc.identifier.issn	18631703
dc.identifier.uri	https://doi.org/10.1007/s11760-025-04420-0
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/20050
dc.publisher	Springer Science and Business Media Deutschland GmbH
dc.subject	Audio acoustics
dc.subject	Audio recordings
dc.subject	Convolution
dc.subject	Neural networks
dc.subject	Wavelet transforms
dc.subject	Acoustic event detection
dc.subject	Acoustic event detections
dc.subject	Convolutional temporal dilated pyramid attention network
dc.subject	Input features
dc.subject	Rare sound event detection
dc.subject	Sound event detection
dc.subject	Sound events
dc.subject	Superlet transform
dc.subject	Time-frequency representations
dc.subject	Trade off
dc.subject	Economic and social effects
dc.title	Rare sound event detection using superlets and a convolutional TDPANet

Collections

Journal Articles

Rare sound event detection using superlets and a convolutional TDPANet

Files

Collections