Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 10 of 12

NucleiSegNet: Robust deep learning architecture for the nuclei segmentation of liver cancer histopathology images
(Elsevier Ltd, 2021) Lal, S.; Das, D.; Alabhya, K.; Kanfade, A.; Kumar, A.; Kini, J.R.
The nuclei segmentation of hematoxylin and eosin (H&E) stained histopathology images is an important prerequisite in designing a computer-aided diagnostics (CAD) system for cancer diagnosis and prognosis. Automated nuclei segmentation methods enable the qualitative and quantitative analysis of tens of thousands of nuclei within H&E stained histopathology images. However, a major challenge during nuclei segmentation is the segmentation of variable sized, touching nuclei. To address this challenge, we present NucleiSegNet - a robust deep learning network architecture for the nuclei segmentation of H&E stained liver cancer histopathology images. Our proposed architecture includes three blocks: a robust residual block, a bottleneck block, and an attention decoder block. The robust residual block is a newly proposed block for the efficient extraction of high-level semantic maps. The attention decoder block uses a new attention mechanism for efficient object localization, and it improves the proposed architecture's performance by reducing false positives. When applied to nuclei segmentation tasks, the proposed deep-learning architecture yielded superior results compared to state-of-the-art nuclei segmentation methods. We applied our proposed deep learning architecture for nuclei segmentation to a set of H&E stained histopathology images from two datasets, and our comprehensive results show that our proposed architecture outperforms state-of-the-art methods. As part of this work, we also introduced a new liver dataset (KMC liver dataset) of H&E stained liver cancer histopathology image tiles, containing 80 images with annotated nuclei procured from Kasturba Medical College (KMC), Mangalore, Manipal Academy of Higher Education (MAHE), Manipal, Karnataka, India. The proposed model's source code is available at https://github.com/shyamfec/NucleiSegNet. © 2020 Elsevier Ltd
O-SegNet: Robust Encoder and Decoder Architecture for Objects Segmentation from Aerial Imagery Data
(Institute of Electrical and Electronics Engineers Inc., 2022) Eerapu, K.K.; Lal, S.; Narasimhadhan, A.V.
The segmentation of diversified roads and buildings from high-resolution aerial images is essential for various applications, such as urban planning, disaster assessment, traffic congestion management, and up-to-date road maps. However, a major challenge during object segmentation is the segmentation of small-sized, diverse shaped roads, and buildings in dominant background scenarios. We introduce O-SegNet- the robust encoder and decoder architecture for objects segmentation from high-resolution aerial imagery data to address this challenge. The proposed O-SegNet architecture contains Guided-Attention (GA) blocks in the encoder and decoder to focus on salient features by representing the spatial dependencies between features of different scales. Further, GA blocks guide the successive stages of encoder and decoder by interrelating the pixels of the same class. To emphasize more on relevant context, the attention mechanism is provided between encoder and decoder after aggregating the global context via an 8 Level Pyramid Pooling Network (PPN). The qualitative and quantitative results of the proposed and existing semantic segmentation architectures are evaluated by utilizing the dataset provided by Kaiser et al. Further, we show that the proposed O-SegNet architecture outperforms state-of-the-art techniques by accurately preserving the road connectivity and structure of buildings. © 2017 IEEE.
Semantic context driven language descriptions of videos using deep neural network
(Springer Science and Business Media Deutschland GmbH, 2022) Naik, D.; Jaidhar, C.D.
The massive addition of data to the internet in text, images, and videos made computer vision-based tasks challenging in the big data domain. Recent exploration of video data and progress in visual information captioning has been an arduous task in computer vision. Visual captioning is attributable to integrating visual information with natural language descriptions. This paper proposes an encoder-decoder framework with a 2D-Convolutional Neural Network (CNN) model and layered Long Short Term Memory (LSTM) as the encoder and an LSTM model integrated with an attention mechanism working as the decoder with a hybrid loss function. Visual feature vectors extracted from the video frames using a 2D-CNN model capture spatial features. Specifically, the visual feature vectors are fed into the layered LSTM to capture the temporal information. The attention mechanism enables the decoder to perceive and focus on relevant objects and correlate the visual context and language content for producing semantically correct captions. The visual features and GloVe word embeddings are input into the decoder to generate natural semantic descriptions for the videos. The performance of the proposed framework is evaluated on the video captioning benchmark dataset Microsoft Video Description (MSVD) using various well-known evaluation metrics. The experimental findings indicate that the suggested framework outperforms state-of-the-art techniques. Compared to the state-of-the-art research methods, the proposed model significantly increased all measures, B@1, B@2, B@3, B@4, METEOR, and CIDEr, with the score of 78.4, 64.8, 54.2, and 43.7, 32.3, and 70.7, respectively. The progression in all scores indicates a more excellent grasp of the context of the inputs, which results in more accurate caption prediction. © 2022, The Author(s).
A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM
(Springer Science and Business Media Deutschland GmbH, 2022) Naik, D.; Jaidhar, C.D.
The massive influx of text, images, and videos to the internet has recently increased the challenge of computer vision-based tasks in big data. Integrating visual data with natural language to generate video explanations has been a challenge for decades. However, recent experiments on image/video captioning that employ Long-Short-Term-Memory (LSTM) have piqued the interest of researchers studying its possible application in video captioning. The proposed video captioning architecture combines the bidirectional multilayer LSTM (BiLSTM) encoder and unidirectional decoder. The innovative architecture also considers temporal relations when creating superior global video representations. In contrast to the majority of prior work, the most relevant features of a video are selected and utilized specifically for captioning purposes. Existing methods utilize a single-layer attention mechanism for linking visual input with phrase meaning. This approach employs LSTMs and a multilayer attention mechanism to extract characteristics from movies, construct links between multi-modal (words and visual material) representations, and generate sentences with rich semantic coherence. In addition, we evaluated the performance of the suggested system using a benchmark dataset for video captioning. The obtained results reveal superior performance relative to state-of-the-art works in METEOR and promising performance relative to the BLEU score. In terms of quantitative performance, the proposed approach outperforms most existing methodologies. © 2022, The Author(s).
Automated hard exudate segmentation using neural encoders and attention mechanisms for diabetic retinopathy diagnosis
(Inderscience Publishers, 2023) Gawas, P.; Sowmya Kamath, S.
Diabetic retinopathy (DR) is a complication caused by increased blood glucose levels, which causes retinal damage in diabetic patients’ eyes. If not discovered and treated early, it can lead to vision loss. Hard exudates (HE) are one of its characteristic signs. Identification of HE is a paramount step in early diagnosis of DR. In this work, the suitability of U-Net-based deep CNN with different encoder configurations and attention gates (AG) is experimented, for HE segmentation. The proposed models were benchmarked on the standard IDRiD dataset. To overcome the challenges related to the limited dataset, data augmentation techniques were also applied to generate image patches and used for model training. Extensive experiments on the dataset revealed that U-Net with AG achieved an accuracy of 98.8%. The U-Net with ResNet50 as the encoder backbone achieved an accuracy of 98.64%. The findings show that the presented models are effective and suitable for early-stage clinical diagnosis. © © 2023 Inderscience Enterprises Ltd.
Solar irradiation forecast enhancement using clustering based CNN-BiLSTM-attention hybrid architecture with PSO
(Taylor and Francis Ltd., 2024) Chiranjeevi, M.; Madyastha, A.; Maurya, A.K.; Moger, T.; Jena, D.
Accurate solar irradiation forecasting is essential for optimising solar energy use. This paper presents a novel forecasting approach: the ‘Clustering-based CNN-BiLSTM-Attention Hybrid Architecture with PSO’. It combines clustering, attention mechanisms, Convolutional Neural Networks (CNN), Bidirectional Long-Short Term Memory (BiLSTM) networks, and Particle Swarm Optimisation (PSO) into a unified framework. Clustering categorises days into groups, improving predictive capabilities. The CNN-BiLSTM model captures spatial and temporal features, identifying complex patterns. PSO optimises the hybrid model’s hyperparameters, while an attention mechanism assigns probability weights to relevant information, enhancing performance. By leveraging spatial and temporal patterns in solar data, the proposed model improves forecasting accuracy in univariate and multivariate analyses with multi-step predictions. Extensive tests on real-world datasets from various locations show the model’s effectiveness. For example, with NASA power data, the model achieves a Mean Absolute Error (MAE) of 24.028 W/m2, Root Mean Square Error (RMSE) of 43.025 W/m2, and an R2 score of 0.984 for 1-hour ahead forecasting. The results show significant improvements over conventional methods. © 2024 Informa UK Limited, trading as Taylor & Francis Group.
A self-attention driven retinex-based deep image prior model for satellite image restoration
(Elsevier Ltd, 2024) Shastry, A.; Padikkal, J.; George, S.; Bini, A.A.
A self attention driven Deep Image Prior (DIP) framework has been proposed in this work for restoring satellite images corrupted by speckled interference and contrast deficiency. The retinex-based framework incorporated here-in leverages the benefits of DIP approach for image restoration, thus requiring only a single input image, eliminating the need for ground truth or training data. An attention framework is incorporated into the architecture of DIP networks to effectively capture fine textures, enhancing the restoration capability of the model. Two generative networks are employed to obtain the luminance and reflectance maps, with the model's loss functions specifically designed to tackle speckle interference and contrast distortions present in the input. These generated maps eventually reconstruct the enhanced version of the image. Satellite images from different sensors are used to demonstrate and compare the performance of the model. Various state-of-the-art models are evaluated and compared with the proposed strategy using different image quality metrics and statistical tests. The experimental results, incorporating both visual and statistical inferences, demonstrate the superiority and efficiency of the model. Additionally, an ablation analysis is performed to determine optimal regularization parameters, and the significance of integrating attention modules at different architecture layers is also demonstrated. © 2023 Elsevier Ltd
AAPFC-BUSnet: Hierarchical encoder–decoder based CNN with attention aggregation pyramid feature clustering for breast ultrasound image lesion segmentation
(Elsevier Ltd, 2024) Sushma, B.; Pulikala, A.
Breast cancer causes a serious menace to women's health and lives, underscoring the urgency of accurate tumor detection. Detecting both cancerous and non-cancerous breast tumors has become increasingly crucial, with ultrasound imaging emerging as a widely adopted modality for this purpose. However, identifying breast lesions in ultrasound images is a challenging task due to various tumor morphologies, geometry, similar color intensity distributions, and fuzzy boundaries, particularly irregularly shaped malignant tumors. This work proposes an encoder–decoder based U-shaped convolutional neural network (CNN) variant with an attention aggregation-based pyramid feature clustering module (AAPFC) to detect breast lesion regions. The network consists of the U-Net variant as a base network and AAPFC to fuse features extracted at the various levels of the base U-Net using a suitable feature fusion technique. Furthermore, the deformable convolution with adaptive self-attention mechanism is introduced to decode the pyramid features parallel to capture the various geometric features at multi-stages. Two public breast lesion ultrasound datasets consisting 263 malignant, 547 benign and 133 normal images are considered to evaluate the performance of the proposed model and state-of-the-art deep CNN-based segmentation models. The proposed model provides 96% accuracy, 68% Mean-IoU, 97% specificity, 82% sensitivity and 0.747 kappa score respectively. The conducted qualitative and quantitative performance analysis experiments show that the proposed model performs better in breast lesion segmentation on ultrasound images. © 2024 Elsevier Ltd
A unified vehicle trajectory prediction model using multi-level context-aware graph attention mechanism
(Springer, 2024) Sundari, K.; Senthil Thilak, A.S.
Predicting the mobility patterns of vehicles together with their interactions among surrounding traffic objects is a critical task in autonomous driving systems. Existing graph neural network-based trajectory prediction models primarily capture the structural connectivity of network nodes (road objects) and assume equal priority to all neighbors of a node. However, in real-time traffic networks, the behavior of each vehicle is significantly influenced by its neighboring road objects and this influence is not uniform. This necessitates a neighbor interaction-aware trajectory prediction model that assumes non-uniform priority among neighboring nodes. In this article, we have designed a novel unified trajectory prediction model which is suitable for both highway and urban traffic conditions. The proposed approach seamlessly integrates multi-level context modeling using graph attention mechanisms, capturing and leveraging interactions and dependencies among objects at varied levels of proximity within a graph. Additionally, it employs an encoder–decoder long short-term memory architecture for long-term trajectory prediction, ensuring adaptability to different driving scenarios. The advanced graph attention mechanisms play a crucial role in modeling spatial dependencies between vehicles, allowing the proposed model to dynamically adapt to evolving interactions over time. The experimentations done on real-world trajectory datasets, namely, Next Generation Simulation US-101 highway dataset and diverse urban datasets such as ApolloScape and Argoverse demonstrate remarkable performance of MC-GATP in long-term trajectory prediction. The model showcases superior prediction accuracy, scalability, and computational efficiency for both highway and urban environments. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
CAEB7-UNet: An Attention-Based Deep Learning Framework for Automated Segmentation of C-Spine Vertebrae in CT Images
(Institute of Electrical and Electronics Engineers Inc., 2025) Pandey, A.K.; Senapati, K.; Pateel, G.P.
Accurate segmentation of vertebrae in computed tomography (CT) images possess serious challenges due to the irregular vertebral boundaries, low contrast and brightness, and noise in CT scans. This study presents a novel channel attention-based EfficientNetB7-UNet (CAEB7-UNet) method to address this complex task effectively. The proposed model introduces an upgraded ReLU-based channel attention module (CAM) in the skip connection which restrains the nonessential attributes by suppressing them and accentuates the relevant features by emphasizing them to boost the overall segmentation performance. In this work, an improved EfficientNetB7 is employed as the encoder for feature extraction, the fusion of local and global features is enhanced through the upgraded CAM in skip connection, and the up-sampling is performed in the decoder. Further, the model is optimized by incorporating hyperparameter optimization, specifically, hybrid learning rate scheduler strategies, along with the AdamW optimizer and custom data augmentation. A total of 34,782 CT images obtained from the RSNA-2022 cervical spine fracture detection challenge is utilized in this study. The proposed model achieves outstanding performance, yielding a dice score index (DSI) of 96.14% and mean intersection over union (mIoU) of 91.46%. Moreover, a comparative performance analysis of CAEB7-UNet with two state-of-the-art models is carried out on the same dataset. Our approach outperforms both the models, with the best one by 8.1%, 6.73%, 12.7%, and 11.98% in terms of DSI, mIoU, precision, and F1-score respectively. Additionally, it requires merely 0.38 seconds to generate the segmentation mask of a single slice of a CT scan. © 2013 IEEE.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results