Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 9 of 9

A Transfer Learning Approach for Diabetic Retinopathy Classification Using Deep Convolutional Neural Networks
(Institute of Electrical and Electronics Engineers Inc., 2018) Krishnan, A.S.; Clive, D.R.; Bhat, V.; Ramteke, P.B.; Koolagudi, S.G.
Diabetic Retinopathy is a disease in which the retina is damaged due to diabetes mellitus. It is a leading cause for blindness today. Detection and quantification of such mellitus from retinal images is tedious and requires expertise. In this paper, an automatic identification of severity of Diabetic Retinopathy using Convolutional Neural Networks (CNNs) with a transfer learning approach has been proposed to aid the diagnostic process. A comparison of different CNN architectures such as ResNet, Inception-ResNet-v2 etc. is done using the quadratic weighted kappa metric. The qualitative and quantitative evaluation of the proposed approach is carried out on the Diabetic Retinopathy detection dataset from Kaggle. From the results, we observe that the proposed model achieves a kappa score of 0.76. Â© 2018 IEEE.
Retinal-Layer Segmentation Using Dilated Convolutions
(Springer Science and Business Media Deutschland GmbH, 2020) Guru Pradeep Reddy, T.; Ashritha, K.S.; Prajwala, T.M.; Girish, G.N.; Kothari, A.R.; Koolagudi, S.G.; Rajan, J.
Visualization and analysis of Spectral Domain Optical Coherence Tomography (SD-OCT) cross-sectional scans has gained a lot of importance in the diagnosis of several retinal abnormalities. Quantitative analytic techniques like retinal thickness and volumetric analysis are performed on cross-sectional images of the retina for early diagnosis and prognosis of retinal diseases. However, segmentation of retinal layers from OCT images is a complicated task on account of certain factors like speckle noise, low image contrast and low signal-to-noise ratio amongst many others. Owing to the importance of retinal layer segmentation in diagnosing ophthalmic diseases, manual segmentation techniques have been proposed and adopted in clinical practice. Nonetheless, manual segmentations suffer from erroneous boundary detection issues. This paper thus proposes a fully automated semantic segmentation technique that uses an encoderâ€“decoder architecture to accurately segment the prominent retinal layers. Â© 2020, Springer Nature Singapore Pte Ltd.
A Transpose-SELDNet for Polyphonic Sound Event Localization and Detection
(Institute of Electrical and Electronics Engineers Inc., 2023) Spoorthy, V.; Koolagudi, S.G.
Human beings have the ability to identify a particular event occurring in a surrounding based on sound cues even when no visual scenes are presented. Sound events are the auditory cues that are present in a surrounding. Sound event detection (SED) is the process of determining the beginning and end of sound events as well as a textual label for the event. The term sound source localization (SSL) refers to the process of identifying the spatial location of a sound occurrence in addition to the SED. The integrated task of SED and SSL is known as Sound Event Localization and Detection (SELD). In this proposed work, three different deep learning architectures are explored to perform SELD. The three deep learning architectures are SELDNet, D-SELDNet (Depthwise Convolution), and T-SELDNet (Transpose Convolution). Two sets of features are used to perform SED and Direction-of-Arrival (DOA) estimation tasks in this work. D-SELDNet uses a Depthwise convolution layer which helps reduce the model's complexity in terms of computation time. T-SELDNet uses Transpose Convolution, which helps in learning better discriminative features by retaining the input size and not losing necessary information from the input. The proposed method is evaluated on the First-order Ambisonic (FOA) array format of the TAU-NIGENS Spatial Sound Events 2020 dataset. An improvement has been observed as compared to the existing SELD systems with the proposed T-SELDNet. Â© 2023 IEEE.
Polyphonic Sound Event Detection Using Modified Recurrent Temporal Pyramid Neural Network
(Springer Science and Business Media Deutschland GmbH, 2024) Venkatesh, S.; Koolagudi, S.G.
In this paper, a novel approach to performing polyphonic Sound Event Detection (SED) is presented. A new deep learning architecture named â€œModified Recurrent Temporal Pyramid Neural Network (MR-TPNN)â€ is introduced. The input features fed to the network are spectrograms generated from Constant Q-Transform (CQT). CQT spectrograms provided better sound event information in the audio recording than the Short Time Fourier Transform (STFT) and Fast Fourier Transform (FFT) methods. The temporal information is an essential factor for detecting the onset and offset of events in an audio recording. Capturing the temporal information is ensured by fusing Temporal pyramids and Bi-directional long short-term memory (LSTM) recurrent layers in deep learning architecture. Extensive experiments are carried out on three benchmark datasets, and the results of the proposed method are superior to those of the existing polyphonic SED systems. Â© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
A Deep Ensemble Learning-Based CNN Architecture for Multiclass Retinal Fluid Segmentation in OCT Images
(Institute of Electrical and Electronics Engineers Inc., 2023) Rahil, M.; Anoop, B.N.; Girish, G.N.; Kothari, A.R.; Koolagudi, S.G.; Rajan, J.
Retinal Fluids (fluid collections) develop because of the accumulation of fluid in the retina, which may be caused by several retinal disorders, and can lead to loss of vision. Optical coherence tomography (OCT) provides non-invasive cross-sectional images of the retina and enables the visualization of different retinal abnormalities. The identification and segmentation of retinal cysts from OCT scans is gaining immense attention since the manual analysis of OCT data is time consuming and requires an experienced ophthalmologist. Identification and categorization of the retinal cysts aids in establishing the pathophysiology of various retinal diseases, such as macular edema, diabetic macular edema, and age-related macular degeneration. Hence, an automatic algorithm for the segmentation and detection of retinal cysts would be of great value to the ophthalmologists. In this study, we have proposed a convolutional neural network-based deep ensemble architecture that can segment the three different types of retinal cysts from the retinal OCT images. The quantitative and qualitative performance of the model was evaluated using the publicly available RETOUCH challenge dataset. The proposed model outperformed the state-of-the-art methods, with an overall improvement of 1.8%. © 2013 IEEE.
Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model
(Birkhauser, 2024) Spoorthy, V.; Koolagudi, S.G.
Identifying a scene based on the environment in which the related audio is recorded is known as acoustic scene classification (ASC). In this paper, a bi-level light-weight Convolutional Neural Network (CNN)-based model is presented to perform ASC. The proposed approach performs classification in two levels. The scenes are classified into three broad categories in the first level as indoor, outdoor, and transportation scenes. The three classes are further categorized into individual scenes in the second level. The proposed approach is implemented using three features: log Mel band energies, harmonic spectrograms and percussive spectrograms. To perform the classification, three CNN classifiers, namely, MobileNetV2, Squeeze-and-Excitation Net (SENet), and a combination of these two architectures, known as SE-MobileNet are used. The proposed combined model encashes the advantages of both MobileNetV2 and SENet architectures. Extensive experiments are conducted on DCASE 2020 (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development and DCASE 2016 ASC datasets. The proposed SE-MobileNet model resulted in a classification accuracy of 96.9% and 86.6% for the first and second levels, respectively, on DCASE 2020 dataset, and 97.6% and 88.4%, respectively, on DCASE 2016 dataset. The proposed model is reported to be better in terms of both complexity and accuracy as compared to the state-of-the-art low-complexity ASC systems. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
InDS: Intelligent DRL Strategy for Effective Virtual Network Embedding of an Online Virtual Network Requests
(Institute of Electrical and Electronics Engineers Inc., 2024) Keerthan Kumar, T.G.K.; Addya, S.K.; Koolagudi, S.G.
Network virtualization is a demanding feature in the evolution of future Internet architectures. It enables on-demand virtualized resource provision for heterogeneous Virtual Network Requests (VNRs) from diverse end users over the underlying substrate network. However, network virtualization provides various benefits such as service separation, improved Quality of Service, security, and more prominent resource usage. It also introduces significant research challenges. One of the major such issues is allocating substrate network resources to VNR components such as virtual machines and virtual links, also named as the virtual network embedding, and it is proven to be mathbb {N}mathbb {P} -hard. To address the virtual network embedding problem, most of the existing works are 1) Single-objective, 2) They failed to address dynamic and time-varying network states 3) They neglected network-specific features. All these limitations hinder the performance of existing approaches. This work introduces an embedding framework called Intelligent Deep Reinforcement Learning (DRL) Strategy for effective virtual network embedding of an online VNRs (InDS). The proposed InDS uses an actor-critic model based on DRL architecture and Graph Convolutional Networks (GCNs). The GCN effectively captures dependencies between the VNRs and substrate network environment nodes by extracting both network and system-specific features. In DRL, the asynchronous advantage actor-critic agents can learn policies from these features during the training to decide which virtual machines to embed on which servers over time. The actor-critic helps in efficiently learning optimal policies in complex environments. The suggested reward function considers multiple objectives and guides the learning process effectively. Evaluation of simulation results shows the effectiveness of InDS in achieving optimal resource allocation and addressing diverse objectives, including minimizing congestion, maximizing acceptance, and revenue-to-cost ratios. The performance of InDS exhibits superiority in achieving 28% of the acceptance ratio and 45% of the revenue-to-cost ratio by effectively managing the network congestion compared to other existing baseline works. © 2013 IEEE.
Rare Sound Event Detection Using Multi-resolution Cochleagram Features and CRNN with Attention Mechanism
(Birkhauser, 2025) Pandey, G.; Koolagudi, S.G.
Acoustic event detection (AED) or sound event detection (SED) is a problem that focuses on automatically detecting acoustic events in an audio recording along with their onset and offset times. Rare acoustic event detection in AED is a challenging problem. Rare AED aims to detect rare but significant sound events in an audio signal. Traditional methods used for SED often struggle to accurately detect rare sound events due to their infrequent occurrence and diverse characteristics. This paper introduces novel features named as multi-resolution cochleagrams (MRCGs) for rare SED tasks. Different cochleagrams with different resolutions are extracted from the audio recording and stacked to get the MRCG feature vector. The equivalent rectangular bandwidth (ERB) scale used in the cochleagram simulates the human auditory filter. The classifier used is a convolutional recurrent neural network (CRNN) embedded with an attention module. This work considers the Task 2 DCASE 2017 dataset for detecting rare sound events. Results show that the proposed MRCG and CRNN with attention combination improves the performance. The proposed method achieved an average error rate of 0.11 and an average F1 score of 94.3%. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
Video forgery localization using inter-frame denoising and intra-frame segmentation
(Springer, 2025) Banerjee, D.; Chittaragi, N.B.; Koolagudi, S.G.
Video forgery detection has been necessary with recent spurt in fake videos like Deepfakes and doctored videos from multiple video capturing devices. In this paper, we provide a novel technique of detecting fake videos by creating an ensemble network, based on statistical and deep learning methods to detect interframe forgery and intraframe forgery in forged videos separately. In this paper, Noise signature extraction of a particular image capturing sensor and an Autoencoder-based Convolutional Neural Network model (CNN) are used to localize the forged regions. We have trained the model to localize Deepfake video forgeries as well as copy-paste forgeries with effective results in the test data. The proposed fake video detector can be applied at the back-end of on-line video aggregating services and check their authenticity to verify the genuineness of videos. The results achieved have shown better performances in detecting fake videos compared to existing methods. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results