Journal Articles

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/19884

Browse

Search Results

Now showing 1 - 2 of 2
  • Item
    DIResUNet: Architecture for multiclass semantic segmentation of high resolution remote sensing imagery data
    (Springer, 2022) Priyanka; Sravya, N.; Lal, S.; Nalini, J.; Chintala, C.S.; Dell’Acqua, F.
    Scene understanding is an important task in information extraction from high-resolution aerial images, an operation which is often involved in remote sensing applications. Recently, semantic segmentation using deep learning has become an important method to achieve state-of-the-art performance in pixel-level classification of objects. This latter is still a challenging task due to large pixel variance within classes possibly coupled with small pixel variance between classes. This paper proposes an artificial-intelligence (AI)-based approach to this problem, by designing the DIResUNet deep learning model. The model is built by integrating the inception module, a modified residual block, and a dense global spatial pyramid pooling (DGSPP) module, in combination with the well-known U-Net scheme. The modified residual blocks and the inception module extract multi-level features, whereas DGSPP extracts contextual intelligence. In this way, both local and global information about the scene are extracted in parallel using dedicated processing structures, resulting in a more effective overall approach. The performance of the proposed DIResUNet model is evaluated on the Landcover and WHDLD high resolution remote sensing (HRRS) datasets. We compared DIResUNet performance with recent benchmark models such as U-Net, UNet++, Attention UNet, FPN, UNet+SPP, and DGRNet to prove the effectiveness of our proposed model. Results show that the proposed DIResUNet model outperforms benchmark models on two HRRS datasets. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    Transformer assisted framework for automated multi-class abnormality classification for video capsule endoscopy
    (Institute of Physics, 2025) Prabhu, M.M.; Kaliki, V.S.; Lal, S.
    Video Capsule Endoscopy (VCE) is a minimally invasive imaging technique used for diagnosing gastrointestinal (GI) disorders, enabling detailed visualization of the digestive tract. This study introduces CASCRNet, a novel and parameter-efficient deep learning architecture designed to enhance interpretability and computational efficiency in multi-class abnormality classification for VCE. CASCRNet integrates focal loss, Atrous Spatial Pyramid Pooling, and Shared Channel Residual blocks to improve feature extraction and address class imbalance. In addition to CASCRNet, this study conducts a comprehensive evaluation of several deep learning models, including ResNet50, DenseNet121, RCCGNet, Hiera, and AIMv2. Among these, AIMv2, a fine-tuned transformer-based model, achieved the highest overall performance, serving as a new benchmark for accuracy. The proposed framework demonstrates robust results on the Capsule Vision 2024 dataset and highlights the potential of both lightweight and transformer-based solutions to improve diagnostic efficiency and clinical workflow in gastrointestinal imaging. © 2025 IOP Publishing Ltd. All rights, including for text and data mining, AI training, and similar technologies, are reserved.