Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 10 of 12

A Survey on Semantic Segmentation Models for Underwater Images
(Springer, 2023) Anand, S.K.; Kumar, P.V.; Saji, R.; Gadagkar, A.V.; Chandavarkar, B.R.
Semantic segmentation remains a key research field in modern day computer vision and has been used in a myriad of applications across various fields. It can be extremely beneficial in the study of underwater scenes. Various underwater applications, such as unmanned explorations and autonomous underwater vehicles, require accurate object classification and detection to allow the probes to avoid malicious objects. However, the models that work well for terrestrial images rarely work just as well for underwater images. This is because underwater images suffer from high blue light intensity as well as other ill effects such as poor lighting and contrast. This can be fixed using preprocessing techniques to manually improve the image characteristics. Trying to improve the model to account for bad image quality is not a great method as the model may misidentify noise as an image characteristic. In this chapter, 6 different deep learning semantic segmentation modelsâ€”SegNet, Pyramid Scene Parsing Network (PSP-Net), U-Net, DNN-VGG (Deep Neural Network-VGG), DeepLabv3+, and SUIM-Netâ€”are explored. Their architectures, technical aspects with respect to underwater images, advantages, and disadvantages are all investigated. Â© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Semantic Segmentation for Autonomous Driving
(Springer Science and Business Media Deutschland GmbH, 2023) Divakarla, U.; Bhat, R.; Madagaonkar, S.B.; Pranav, D.V.; Shyam, C.; Chandrashekar, K.
Recently, autonomous vehicles (namely self-driving cars) are becoming increasingly common in developed urban areas. It is of utmost importance for real-time systems such as robots and automatic vehicles (AVs) to understand visual data, make inferences and predict events in the near future. The ability to perceive RGB values (and other visual data such as thermal, LiDAR), and segment each pixel into objects is called semantic segmentation. It is the first step toward any sort of automated machinery. Some existing models use deep learning methods for 3D object detection in RGB images but are not completely efficient when they are fused with thermal imagery as well. In this paper, we summarize many of these architectures starting from those that are applicable to general segmentation and then those that are specifically designed for autonomous vehicles. We also cover open challenges and questions for further research. Â© 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
Semi-supervised Semantic Segmentation for Effusion Cytology Images
(Springer Science and Business Media Deutschland GmbH, 2023) Aboobacker, S.; Vijayasenan, D.; Sumam David, S.; Suresh, P.K.; Sreeram, S.
Cytopathologists analyse images captured at different magnifications to detect the malignancies in effusions. They identify the malignant cell clusters from the lower magnification, and the identified area is zoomed inÂ to study cell level details in high magnification. The automatic segmentation of low magnification images saves scanning time and storage requirements. This work predicts the malignancy in the effusion cytology images at low magnification levels such as 10 Ã— and 4 Ã—. However, the biggest challenge is the difficulty in annotating the low magnification images, especially the 4 Ã— data. We extend a semi-supervised learning (SSL) semantic model to train unlabelled 4 Ã— data with the labelled 10 Ã— data. The benign F-score on the predictions of 4 Ã— data using the SSL model is improved 15% compared with the predictions of 4 Ã— data on the semantic 10 Ã— model. Â© 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
Semantic Segmentation of Underwater Images with CNN Based Adaptive Thresholding
(Springer Science and Business Media Deutschland GmbH, 2025) Anand, S.K.; Kumar, P.V.; Saji, R.; Gadagkar, A.V.; Chandavarkar, B.R.
Semantic segmentation remains a key research field in modern day computer vision and has been used in a myriad of applications across various fields. It can be extremely beneficial in the study of underwater scenes. Various underwater applications, like unmanned explorations and autonomous underwater vehicles, require accurate object classification and detection to allow the probes to avoid malicious objects. However, the models which work well for terrestrial images rarely work just as well for underwater images. This is because underwater images suffer from high blue light intensity as well as other ill-effects such as poor lighting and contrast. Trying to improve the model to account for bad image quality is not a great method as the model may misidentify noise as an image characteristic. In this paper, a unique CNN-based approach for post-processing image thresholding is proposed, on top of 3 models used for the semantic segmentation itselfâ€“Segnet, U-Net, and Deeplabv3+. The modelsâ€™ outputs are then subject to the CNN-based post-processing technique to binarize the outputs into masks, and provides improved segmentation results compared to the base models. Â© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
Semantic Segmentation of Remotely Sensed Images for Land-use and Land-cover Classification: A Comprehensive Review
(Taylor and Francis Ltd., 2025) Putty, A.; Annappa, B.; Pariserum Perumal, S.
Remotely Sensed Images (RSI) based land-use and land-cover (LULC) mapping facilitates applications such as forest logging, biodiversity protection, and urban topographical kinetics. This process has gained more attention with the widespread availability of geospatial and remote sensing data. With recent advances in machine learning and the possibility of processing nearly real-time information on the computer, LULC mapping methods broadly fall into two categories: (i) framework-dependent algorithms, where mappings are done using the in-built algorithms in Geographical Information System (GIS) software and (ii) framework-independent algorithms, which are mainly based on deep learning techniques. Both approaches have their unique advantages and challenges. Along with the working patterns and performances of these two methodologies, this comprehensive review thoroughly analyzes deep learning architectures catering different technical capabilities like feature extraction, boundary extraction, transformer-based mechanism based mechanism, attention mechanism, pyramid pooling and lightweight models. To fine-tune these semantic segmentation processes, current technical and domain challenges and insights into future directions for analysing RSIs of varying spatial and temporal resolutions are summarized. Cross domain users with application specific requirements can make use of this study to select appropriate LULC semantic segmentation models. © 2025 IETE.
Dense refinement residual network for road extraction from aerial imagery data
(Institute of Electrical and Electronics Engineers Inc., 2019) Eerapu, K.K.; Ashwath, B.; Lal, S.; Dell’Acqua, F.; Narasimha Dhan, A.V.
Extraction of roads from high-resolution aerial images with a high degree of accuracy is a prerequisite in various applications. In aerial images, road pixels and background pixels are generally in the ratio of ones-to-tens, which implies a class imbalance problem. Existing semantic segmentation architectures generally do well in road-dominated cases but fail in background-dominated scenarios. This paper proposes a dense refinement residual network (DRR Net) for semantic segmentation of aerial imagery data. The proposed semantic segmentation architecture is composed of multiple DRR modules for the extraction of diversified roads alleviating the class imbalance problem. Each module of the proposed architecture utilizes dense convolutions at various scales only in the encoder for feature learning. Residual connections in each module of the proposed architecture provide the guided learning path by propagating the combined features to subsequent DRR modules. Segmentation maps undergo various levels of refinement based on the number of DRR modules utilized in the architecture. To emphasize more on small object instances, the proposed architecture has been trained with a composite loss function. The qualitative and quantitative results are reported by utilizing the Massachusetts roads dataset. The experimental results report that the proposed architecture provides better results as compared to other recent architectures. © 2019 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
O-SegNet: Robust Encoder and Decoder Architecture for Objects Segmentation from Aerial Imagery Data
(Institute of Electrical and Electronics Engineers Inc., 2022) Eerapu, K.K.; Lal, S.; Narasimhadhan, A.V.
The segmentation of diversified roads and buildings from high-resolution aerial images is essential for various applications, such as urban planning, disaster assessment, traffic congestion management, and up-to-date road maps. However, a major challenge during object segmentation is the segmentation of small-sized, diverse shaped roads, and buildings in dominant background scenarios. We introduce O-SegNet- the robust encoder and decoder architecture for objects segmentation from high-resolution aerial imagery data to address this challenge. The proposed O-SegNet architecture contains Guided-Attention (GA) blocks in the encoder and decoder to focus on salient features by representing the spatial dependencies between features of different scales. Further, GA blocks guide the successive stages of encoder and decoder by interrelating the pixels of the same class. To emphasize more on relevant context, the attention mechanism is provided between encoder and decoder after aggregating the global context via an 8 Level Pyramid Pooling Network (PPN). The qualitative and quantitative results of the proposed and existing semantic segmentation architectures are evaluated by utilizing the dataset provided by Kaiser et al. Further, we show that the proposed O-SegNet architecture outperforms state-of-the-art techniques by accurately preserving the road connectivity and structure of buildings. © 2017 IEEE.
DIResUNet: Architecture for multiclass semantic segmentation of high resolution remote sensing imagery data
(Springer, 2022) Priyanka; Sravya, N.; Lal, S.; Nalini, J.; Chintala, C.S.; Dell’Acqua, F.
Scene understanding is an important task in information extraction from high-resolution aerial images, an operation which is often involved in remote sensing applications. Recently, semantic segmentation using deep learning has become an important method to achieve state-of-the-art performance in pixel-level classification of objects. This latter is still a challenging task due to large pixel variance within classes possibly coupled with small pixel variance between classes. This paper proposes an artificial-intelligence (AI)-based approach to this problem, by designing the DIResUNet deep learning model. The model is built by integrating the inception module, a modified residual block, and a dense global spatial pyramid pooling (DGSPP) module, in combination with the well-known U-Net scheme. The modified residual blocks and the inception module extract multi-level features, whereas DGSPP extracts contextual intelligence. In this way, both local and global information about the scene are extracted in parallel using dedicated processing structures, resulting in a more effective overall approach. The performance of the proposed DIResUNet model is evaluated on the Landcover and WHDLD high resolution remote sensing (HRRS) datasets. We compared DIResUNet performance with recent benchmark models such as U-Net, UNet++, Attention UNet, FPN, UNet+SPP, and DGRNet to prove the effectiveness of our proposed model. Results show that the proposed DIResUNet model outperforms benchmark models on two HRRS datasets. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Semantic segmentation of low magnification effusion cytology images: A semi-supervised approach
(Elsevier Ltd, 2022) Aboobacker, S.; Vijayasenan, D.; Sumam David, S.; Suresh, P.K.; Sreeram, S.
Cytopathologists examine microscopic images obtained at various magnifications to identify malignancy in effusions. They locate the malignant cell clusters at a low magnification and then zoom in to investigate cell-level features at a high magnification. This study predicts the malignancy at low magnification levels such as 4X and 10X in effusion cytology images to reduce scanning time. However, the most challenging problem is annotating the low magnification images, particularly the 4X images. This paper extends two semi-supervised learning (SSL) models, MixMatch and FixMatch, for semantic segmentation. The original FixMatch and MixMatch algorithms are designed for classification tasks. While performing image augmentation, the generated pseudo labels are spatially altered. We introduce reverse augmentation to compensate for the effect of the spatial alterations. The extended models are trained using labelled 10X and unlabelled 4X images. The average F-score of benign and malignant pixels on the predictions of 4X images is improved approximately by 9% for both Extended MixMatch and Extended FixMatch respectively compared with the baseline model. In the Extended MixMatch, 62% sub-regions of low magnification images are eliminated from scanning at a higher magnification, thereby saving scanning time. © 2022 Elsevier Ltd
A Dual Phase Approach for Addressing Class Imbalance in Land-Use and Land-Cover Mapping From Remotely Sensed Images
(Institute of Electrical and Electronics Engineers Inc., 2024) Putty, A.; Annappa, B.; Prajwal, R.; Pariserum Perumal, S.P.
Semantic segmentation of remotely sensed images for land-use and land-cover classes plays a significant role in various ecosystem management applications. State-of-the-art results in assigning land-use and land-cover classes are primarily achieved using fully convolutional encoder-decoder architectures. However, the uneven distribution of the land-use and land-cover classes becomes a major hurdle leading to performance skewness towards majority classes over minority classes. This paper proposes a novel dual-phase training, with the first phase proposing a new undersampling technique using minority class focused class normalization and the second phase that uses this learnt knowledge for ensembling to prevent overfitting and compensate for the loss of information due to undersampling. The proposed method achieved an overall performance gain of up to 2% in MIoU, Kappa, and F1 Score metrics and up to 3% in class-wise F1-score when compared to the baseline models on Wuhan Dense Labeling, Vaihingen and Potsdam datasets. © 2013 IEEE.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results