Journal Articles

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/19884

Browse

Search Results

Now showing 1 - 8 of 8

Multi-Res-Attention UNet: A CNN Model for the Segmentation of Focal Cortical Dysplasia Lesions from Magnetic Resonance Images
(Institute of Electrical and Electronics Engineers Inc., 2021) Thomas, E.; Pawan, S.J.; Kumar, S.; Horo, A.; Niyas, S.; Vinayagamani, S.; Kesavadas, C.; Rajan, J.
In this work, we have focused on the segmentation of Focal Cortical Dysplasia (FCD) regions from MRI images. FCD is a congenital malformation of brain development that is considered as the most common causative of intractable epilepsy in adults and children. To our knowledge, the latest work concerning the automatic segmentation of FCD was proposed using a fully convolutional neural network (FCN) model based on UNet. While there is no doubt that the model outperformed conventional image processing techniques by a considerable margin, it suffers from several pitfalls. First, it does not account for the large semantic gap of feature maps passed from the encoder to the decoder layer through the long skip connections. Second, it fails to leverage the salient features that represent complex FCD lesions and suppress most of the irrelevant features in the input sample. We propose Multi-Res-Attention UNet; a novel hybrid skip connection-based FCN architecture that addresses these drawbacks. Moreover, we have trained it from scratch for the detection of FCD from 3 T MRI 3D FLAIR images and conducted 5-fold cross-validation to evaluate the model. FCD detection rate (Recall) of 92% was achieved for patient wise analysis. © 2013 IEEE.
Video summarization and captioning using dynamic mode decomposition for surveillance
(Springer Science and Business Media B.V., 2021) Radarapu, R.; Gopal, A.S.S.; Nh, M.; Anand Kumar, M.
Video surveillance has become a major tool in security maintenance. But analyzing in a playback version to detect any motion or any sort of movements might be tedious work because only for a short length of the video there would be any motion. There would be a lot of time wasted in analyzing the video and also it is impossible to always find the accurate frame where the transition has occurred. So there is a need in obtaining a summary video that captures any changes/motion. With the advancements in image processing using OpenCV and deep learning, video summarization is no longer an impossible work. Captions are generated for the summarized videos using an encoder–decoder captioning model. With the help of large, well-labeled video data sets like common objects in context, Microsoft video description, video captioning is a feasible task. Encoder–decoder models are used extensively to extract text from visual features with the arrival of long short term memory (LSTM). Attention mechanism has been widely used on decoder for the work of video captioning. Keyframes are obtained from very long videos using methods like dynamic mode decomposition, an algorithm in fluid dynamics, OpenCV’s absdiff(). We propose these tools for motion detection and video/image captioning for very long videos which are common in video surveillance. © 2021, Bharati Vidyapeeth's Institute of Computer Applications and Management.
Spatiotemporal Assessment of Satellite Image Time Series for Land Cover Classification Using Deep Learning Techniques: A Case Study of Reunion Island, France
(MDPI, 2022) Navnath, N.N.; Chandrasekaran, K.; Stateczny, A.; Sundaram, V.M.; Prabhavathy, P.
Current Earth observation systems generate massive amounts of satellite image time series to keep track of geographical areas over time to monitor and identify environmental and climate change. Efficiently analyzing such data remains an unresolved issue in remote sensing. In classifying land cover, utilizing SITS rather than one image might benefit differentiating across classes because of their varied temporal patterns. The aim was to forecast the land cover class of a group of pixels as a multi-class single-label classification problem given their time series gathered using satellite images. In this article, we exploit SITS to assess the capability of several spatial and temporal deep learning models with the proposed architecture. The models implemented are the bidirectional gated recurrent unit (GRU), temporal convolutional neural networks (TCNN), GRU + TCNN, attention on TCNN, and attention of GRU + TCNN. The proposed architecture integrates univariate, multivariate, and pixel coordinates for the Reunion Island’s landcover classification (LCC). the evaluation of the proposed architecture with deep neural networks on the test dataset determined that blending univariate and multivariate with a recurrent neural network and pixel coordinates achieved increased accuracy with higher F1 scores for each class label. The results suggest that the models also performed exceptionally well when executed in a partitioned manner for the LCC task compared to the temporal models. This study demonstrates that using deep learning approaches paired with spatiotemporal SITS data addresses the difficult task of cost-effectively classifying land cover, contributing to a sustainable environment. © 2022 by the authors.
Semantic context driven language descriptions of videos using deep neural network
(Springer Science and Business Media Deutschland GmbH, 2022) Naik, D.; Jaidhar, C.D.
The massive addition of data to the internet in text, images, and videos made computer vision-based tasks challenging in the big data domain. Recent exploration of video data and progress in visual information captioning has been an arduous task in computer vision. Visual captioning is attributable to integrating visual information with natural language descriptions. This paper proposes an encoder-decoder framework with a 2D-Convolutional Neural Network (CNN) model and layered Long Short Term Memory (LSTM) as the encoder and an LSTM model integrated with an attention mechanism working as the decoder with a hybrid loss function. Visual feature vectors extracted from the video frames using a 2D-CNN model capture spatial features. Specifically, the visual feature vectors are fed into the layered LSTM to capture the temporal information. The attention mechanism enables the decoder to perceive and focus on relevant objects and correlate the visual context and language content for producing semantically correct captions. The visual features and GloVe word embeddings are input into the decoder to generate natural semantic descriptions for the videos. The performance of the proposed framework is evaluated on the video captioning benchmark dataset Microsoft Video Description (MSVD) using various well-known evaluation metrics. The experimental findings indicate that the suggested framework outperforms state-of-the-art techniques. Compared to the state-of-the-art research methods, the proposed model significantly increased all measures, B@1, B@2, B@3, B@4, METEOR, and CIDEr, with the score of 78.4, 64.8, 54.2, and 43.7, 32.3, and 70.7, respectively. The progression in all scores indicates a more excellent grasp of the context of the inputs, which results in more accurate caption prediction. © 2022, The Author(s).
A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM
(Springer Science and Business Media Deutschland GmbH, 2022) Naik, D.; Jaidhar, C.D.
The massive influx of text, images, and videos to the internet has recently increased the challenge of computer vision-based tasks in big data. Integrating visual data with natural language to generate video explanations has been a challenge for decades. However, recent experiments on image/video captioning that employ Long-Short-Term-Memory (LSTM) have piqued the interest of researchers studying its possible application in video captioning. The proposed video captioning architecture combines the bidirectional multilayer LSTM (BiLSTM) encoder and unidirectional decoder. The innovative architecture also considers temporal relations when creating superior global video representations. In contrast to the majority of prior work, the most relevant features of a video are selected and utilized specifically for captioning purposes. Existing methods utilize a single-layer attention mechanism for linking visual input with phrase meaning. This approach employs LSTMs and a multilayer attention mechanism to extract characteristics from movies, construct links between multi-modal (words and visual material) representations, and generate sentences with rich semantic coherence. In addition, we evaluated the performance of the suggested system using a benchmark dataset for video captioning. The obtained results reveal superior performance relative to state-of-the-art works in METEOR and promising performance relative to the BLEU score. In terms of quantitative performance, the proposed approach outperforms most existing methodologies. © 2022, The Author(s).
Exploiting skeleton-based gait events with attention-guided residual deep learning model for human identification
(Springer, 2023) Rashmi, M.; Guddeti, R.M.R.
Human identification using unobtrusive visual features is a daunting task in smart environments. Gait is among adequate biometric features when the camera cannot correctly capture the human face due to environmental factors. In recent years, gait-based human identification using skeleton data has been intensively studied using a variety of feature extractors and more sophisticated deep learning models. Although skeleton data is susceptible to changes in covariate variables, resulting in noisy data, most existing algorithms employ a single feature extraction technique for all frames to generate frame-level feature maps. This results in degraded performance and additional features, necessitating increased computing power. This paper proposes a robust feature extractor that extracts a quantitative summary of gait event-specific information, thereby reducing the total number of features throughout the gait cycle. In addition, a novel Attention-guided LSTM-based deep learning model with residual connections is proposed to learn the extracted features for gait recognition. The proposed approach outperforms the state-of-the-art works on five publicly available datasets on various benchmark evaluation protocols and metrics. Further, the CMC test revealed that the proposed model obtained higher than 97% Accuracy in lower-level ranks on these datasets. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Latent fingerprint segmentation using multi-scale attention U-Net
(Inderscience Publishers, 2024) Akhila, P.; Koolagudi, S.G.
Latent fingerprints are the fingerprints lifted from crime scene surfaces. Segmentation of latent fingerprints from the background is an important preprocessing task which is challenging due to the poor quality of the fingerprints. Though fingerprint segmentation approaches based on their orientation and frequency are reported in the literature, they could not adequately address the problem. We propose a latent fingerprint segmentation model based on the U-Net attention network in this work. We added the Atrous Spatial Pyramid Pooling (ASPP) layer to the network to facilitate multi-scale fingerprint segmentation. Our approach could effectively segment the latent fingerprint region from the background and even detect occluded and partial fingerprints with simple network architecture. To evaluate the performance, we have compared our results with the manual ground truth using NIST SD27A dataset. Our segmentation model has improved matching accuracy on the NIST SD27A dataset. © 2024 Inderscience Enterprises Ltd.
Efficient Kalman filter based deep learning approaches for workload prediction in cloud and edge environments
(Springer, 2025) Kumar, M.R.; Annappa, B.; Yadav, V.
Offering cloud resources to consumers presents several difficulties for cloud service providers. When utilizing resources efficiently in cloud and edge contexts, precisely forecasting workload is a crucial problem. Accurate workload prediction allows intelligent resource allocation, preventing needless waste of computational and storage resources while meeting user’s Quality of Service(QoS). In order to mitigate this issue, Kalman filter-based novel hybrid models, including Long Short Term Memory (LSTM), Bi-directional Long Short Term Memory (BI-LSTM), and Gated Recurrent Unit (GRU), are proposed. These models utilize CNN and attention mechanisms to predict workloads at Edge Servers accurately. The proposed models were extensively evaluated on real world traces like Alibaba_v2018, Materna, Bitbrains, Microsoft Azure_2019 and Planet lab datasets at various time intervals with and without using Kalman filter. The experimental comparison shows that 97%, 82% and 90% reduction in MSE for Alibaba, 73%, 73% and 63% reduction in MSE for Materna, 72%, 63% and 40% reduction in MSE for Planet lab, 95%, 77% and 96% reduction in MSE for Microsoft Azure and 91%, 87% and 91% reduction in MSE for Bitbrains with respect to CPU utilization %. The effectiveness of the proposed forecasting model is validated through statistical analysis using the Friedman and Nemenyi post-hoc tests. © The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2024.

Journal Articles

Browse

Filters

Settings

Sort By

Results per page

Search Results