Acoustic Scene Classification Using Speech Features

Mulimani, Manjunath.

Acoustic Scene Classification Using Speech Features

Files

Primary 155098CS15FV06.pdf (2.02 MB)

Date

2020

Authors

Mulimani, Manjunath.

Publisher

National Institute of Technology Karnataka, Surathkal

Abstract

Currently, smart devices like smartphones, laptops, tablets, etc., need human intervention in the effective delivery of the services. They are capable of recognizing stuff like speech, music, images, characters and so on. To make smart systems behave as intelligent ones, we need to build a capacity in them, to understand and respond to the surrounding situation accordingly, without human intervention. Enabling the devices to sense the environment in which they are present through analysis of sound is the main objective of the Acoustic Scene Classification. The initial step in analyzing the surroundings is recognition of acoustic events present in day-to-day environment. Such acoustic events are broadly categorized into two types: monophonic and polyphonic. Monophonic acoustic events correspond to the non-overlapped events; in other words, at most one acoustic event is active in a given time. Polyphonic acoustic events correspond to the overlapped events; in other words, multiple acoustic events occur at the same time instance. In this work, we aim to develop the systems for automatic recognition of monophonic and polyphonic acoustic events along with corresponding acoustic scene. Applications of this research work include context-aware mobile devices, robots, intelligent monitoring systems, assistive technologies for hearing-aids and so on. Some of the important issues in this research area are, identifying acoustic event specific features for acoustic event characterization and recognition, optimization of the existing algorithms, developing robust mechanisms for acoustic event recognition in noisy environments, making the-state-of-the-art methods working on big data, developing a joint model that recognizes both acoustic events followed by corresponding scenes etc. Some of the existing approaches towards solutions have major limitations of using known traditional speech features, that are sensitive to noise, use of features from two-dimensional Time-Frequency Representations (TFRs) for recognizing the acoustic events, that demand high computational time;use of deep learning models, that require substantially huge amount of training data. Many novel approaches have been presented in this thesis for recognition of monophonic acoustic events, polyphonic acoustic events and scenes. Two main challenges associated with the real-time Acoustic Event Classification (AEC) are addressed in this thesis. The first one is the effective recognition of acoustic events in noisy environments, and the second one is the use of MapReduce programming model on Hadoop distributed environment to reduce computational complexity. In this thesis, the features are extracted from the spectrograms, which are robust compared to the traditional speech features. Further, an improved Convolutional Recurrent Neural Network (CRNN) and a Deep Neural Network-Driven feature learning models are proposed for Polyphonic Acoustic Event Detection (AED) in real-life recordings. Finally, binaural features are explored to train Kervolutional Recurrent Neural Network (KRNN), which recognizes both acoustic events and a respective scene of an audio signal. Detailed experimental evaluation is carried out to compare the performance of each of the proposed approaches against baseline and state-of-the-art systems.

Keywords

Department of Computer Science & Engineering, Monophonic Acoustic Event Classification (AEC), polyphonic, Acoustic Event Detection (AED), Acoustic Scene Classification (ASC), Time Frequency Representations (TFRs), Map Reduce programming model, Convolutional Recurrent Neural Network (CRNN), Kervolutional Recurrent Neural Network (KRNN)

URI

https://idr.nitk.ac.in/handle/123456789/16838

Collections

1. Ph.D Theses

Full item page

Acoustic Scene Classification Using Speech Features

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By