Journal Articles
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/19884
Browse
9 results
Search Results
Item Optimization of countour based template matching using GPGPU based hexagonal framework(Machine Intelligence Research (MIR) Labs contact@mirlabs.org, 2015) Bhagya, M.; Tripathi, S.; Santhi Thilagam, P.This paper presents a technique to optimize contour based template matching by using general purpose computation on graphics processing units (GPGPU). Contour based template matching requires edge detection and searching for presence of a template in an entire image, real time implementation of which is not trivial. Using the proposed solution, we could achieve an implementation fast enough to process a standard video (640x480) in real time with sufficient accuracy.Item An efficient cuckoo search algorithm based multilevel thresholding for segmentation of satellite images using different objective functions(Elsevier Ltd, 2016) Suresh, S.; Lal, S.Satellite image segmentation is challenging due to the presence of weakly correlated and ambiguous multiple regions of interest. Several bio-inspired algorithms were developed to generate optimum threshold values for segmenting such images efficiently. Their exhaustive search nature makes them computationally expensive when extended to multilevel thresholding. In this paper, we propose a computationally efficient image segmentation algorithm, called CSMcCulloch, incorporating McCulloch's method for lévy flight generation in Cuckoo Search (CS) algorithm. We have also investigated the impact of Mantegna?s method forlévy flight generation in CS algorithm (CSMantegna) by comparing it with the conventional CS algorithm which uses the simplified version of the same. CSMantegna algorithm resulted in improved segmentation quality with an expense of computational time. The performance of the proposed CSMcCulloch algorithm is compared with other bio-inspired algorithms such as Particle Swarm Optimization (PSO) algorithm, Darwinian Particle Swarm Optimization (DPSO) algorithm, Artificial Bee Colony (ABC) algorithm, Cuckoo Search (CS) algorithm and CSMantegna algorithm using Otsu's method, Kapur entropy and Tsallis entropy as objective functions. Experimental results were validated by measuring PSNR, MSE, FSIM and CPU running time for all the cases investigated. The proposed CSMcCulloch algorithm evolved to be most promising, and computationally efficient for segmenting satellite images. Convergence rate analysis also reveals that the proposed algorithm outperforms others in attaining stable global optimum thresholds. The experiments results encourages related researches in computer vision, remote sensing and image processing applications. © 2016 Elsevier Ltd. All rights reserved.Item Fast interactive superpixel based image region generation(Blue Eyes Intelligence Engineering and Sciences Publication, 2019) Naik, D.; Muhammed Shameem, P.K.Image Segmentation has always been a problem of interest and the challenging task in the field of the computer-based vision system. It plays a vital role in the field of object detection and recognition. Identifying with separating a part of the interest from a complicated image is easy for the human vision system, but the same is cumbersome to automate. The proposed work is a novel combined technology for fast segmentation of foreground (area of interest) out of an image which possesses a background and other complications. This work utilizes the latest industrial class technologies with advanced algorithms. Our approach remarkably increased the performance by working on a super-pixelated image rather than a normal n x n pixel image. The proposed work is mainly focused on interactive segmentation. This could be actively used in fields like the medical analysis. Our segmentation technique is a binary segmentation where it classifies pixels into two distinct sets. The proposed scheme is experimentally shown to compare favorably with contemporary interactive image segmentation schemes when applied to colored and gray-scale images. © BEIESP.Item L, r-Stitch Unit: Encoder-Decoder-CNN Based Image-Mosaicing Mechanism for Stitching Non-Homogeneous Image Sequences(Institute of Electrical and Electronics Engineers Inc., 2021) Chilukuri, P.K.; Padala, P.; Padala, P.; Desanamukula, V.S.; Pvgd, P.R.Image-stitching (or) mosaicing is considered an active research-topic with numerous use-cases in computer-vision, AR/VR, computer-graphics domains, but maintaining homogeneity among the input image sequences during the stitching/mosaicing process is considered as a primary-limitation major-disadvantage. To tackle these limitations, this article has introduced a robust and reliable image stitching methodology (l,r-Stitch Unit), which considers multiple non-homogeneous image sequences as input to generate a reliable panoramically stitched wide view as the final output. The l,r-Stitch Unit further consists of a pre-processing, post-processing sub-modules a l,r-PanoED-network, where each sub-module is a robust ensemble of several deep-learning, computer-vision image-handling techniques. This article has also introduced a novel convolutional-encoder-decoder deep-neural-network (l,r-PanoED-network) with a unique split-encoding-network methodology, to stitch non-coherent input left, right stereo image pairs. The encoder-network of the proposed l,r-PanoED extracts semantically rich deep-feature-maps from the input to stitch/map them into a wide-panoramic domain, the feature-extraction feature-mapping operations are performed simultaneously in the l,r-PanoED's encoder-network based on the split-encoding-network methodology. The decoder-network of l,r-PanoED adaptively reconstructs the output panoramic-view from the encoder networks' bottle-neck feature-maps. The proposed l,r-Stitch Unit has been rigorously benchmarked with alternative image-stitching methodologies on our custom-built traffic dataset and several other public-datasets. Multiple evaluation metrics (SSIM, PSNR, MSE, L_{\alpha,\beta,\gamma } , FM-rate, Average-latency-time) wild-Conditions (rotational/color/intensity variances, noise, etc) were considered during the benchmarking analysis, and based on the results, our proposed method has outperformed among other image-stitching methodologies and has proved to be effective even in wild non-homogeneous inputs. © 2013 IEEE.Item An improved edge detection technique(Inderscience Publishers, 2021) Meherhomji, V.; Shenoy, K.B.A.Traditional edge detection methods tend to apply a single threshold over the entire image. However, natural images rarely have uniform illumination throughout, thus just a single threshold across the image is insufficient. This paper explores a method to recursively divide an image into regions and provide each region with an optimal threshold. For each region, we have calculated the threshold automatically using Otsu’s binarisation method. The method’s key goal is to reduce the effect of noise present in images, which leads to the elimination of false edges. It does this while also ensuring that true edges present within the image are not lost. We have proved that asymptotic time complexity of the proposed method is O(MNlog?) (where ? = min{M, N}). We have compared the performance of our method with the Canny edge detection technique. The Canny edge detector is a well known and widely used edge detection technique which outperforms all the classical edge detection techniques. The results show that our method outperforms the Canny edge detection technique. PSNR values for our method are much higher than that of the Canny edge detection algorithm for almost all the images considered from BSD500 benchmark dataset. © 2021 Inderscience Enterprises Ltd.Item Semantic context driven language descriptions of videos using deep neural network(Springer Science and Business Media Deutschland GmbH, 2022) Naik, D.; Jaidhar, C.D.The massive addition of data to the internet in text, images, and videos made computer vision-based tasks challenging in the big data domain. Recent exploration of video data and progress in visual information captioning has been an arduous task in computer vision. Visual captioning is attributable to integrating visual information with natural language descriptions. This paper proposes an encoder-decoder framework with a 2D-Convolutional Neural Network (CNN) model and layered Long Short Term Memory (LSTM) as the encoder and an LSTM model integrated with an attention mechanism working as the decoder with a hybrid loss function. Visual feature vectors extracted from the video frames using a 2D-CNN model capture spatial features. Specifically, the visual feature vectors are fed into the layered LSTM to capture the temporal information. The attention mechanism enables the decoder to perceive and focus on relevant objects and correlate the visual context and language content for producing semantically correct captions. The visual features and GloVe word embeddings are input into the decoder to generate natural semantic descriptions for the videos. The performance of the proposed framework is evaluated on the video captioning benchmark dataset Microsoft Video Description (MSVD) using various well-known evaluation metrics. The experimental findings indicate that the suggested framework outperforms state-of-the-art techniques. Compared to the state-of-the-art research methods, the proposed model significantly increased all measures, B@1, B@2, B@3, B@4, METEOR, and CIDEr, with the score of 78.4, 64.8, 54.2, and 43.7, 32.3, and 70.7, respectively. The progression in all scores indicates a more excellent grasp of the context of the inputs, which results in more accurate caption prediction. © 2022, The Author(s).Item A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM(Springer Science and Business Media Deutschland GmbH, 2022) Naik, D.; Jaidhar, C.D.The massive influx of text, images, and videos to the internet has recently increased the challenge of computer vision-based tasks in big data. Integrating visual data with natural language to generate video explanations has been a challenge for decades. However, recent experiments on image/video captioning that employ Long-Short-Term-Memory (LSTM) have piqued the interest of researchers studying its possible application in video captioning. The proposed video captioning architecture combines the bidirectional multilayer LSTM (BiLSTM) encoder and unidirectional decoder. The innovative architecture also considers temporal relations when creating superior global video representations. In contrast to the majority of prior work, the most relevant features of a video are selected and utilized specifically for captioning purposes. Existing methods utilize a single-layer attention mechanism for linking visual input with phrase meaning. This approach employs LSTMs and a multilayer attention mechanism to extract characteristics from movies, construct links between multi-modal (words and visual material) representations, and generate sentences with rich semantic coherence. In addition, we evaluated the performance of the suggested system using a benchmark dataset for video captioning. The obtained results reveal superior performance relative to state-of-the-art works in METEOR and promising performance relative to the BLEU score. In terms of quantitative performance, the proposed approach outperforms most existing methodologies. © 2022, The Author(s).Item Investigation into facial expression recognition methods: a review(Institute of Advanced Engineering and Science, 2023) Devarapalli, A.; Gonda, J.M.Facial expression recognition (FER) is a rapidly emerging topic in computer vision that has gotten a lot of interest because of its numerous applications in fields including psychology, sociology, human-computer interaction (HCI), and security. FER seeks to recognise and analyse human facial expressions in order to determine emotions and other mental states. Several strategies, including feature-based, kernel-based, and deep learning-based methods, have been developed and implemented in FER in recent years. FER's major goal is to extract and identify the most discriminating elements that accurately represent the emotions expressed by facial expressions. The literature reviewed in this field shows that deep learning-based methods have outperformed traditional feature-based and kernel-based methods in terms of accuracy and robustness in recognizing facial expressions. However, these deep learning-based methods also pose several challenges, such as the need for large labeled-data-sets, robustness to different facial poses and illumination conditions, and generalization to unseen data. Despite these challenges, the field of FER is expected to continue growing, and future research will likely focus on addressing these challenges and improving the accuracy and robustness of FER systems. © 2023 Institute of Advanced Engineering and Science. All rights reserved.Item Video Captioning using Sentence Vector-enabled Convolutional Framework with Short-Connected LSTM(Springer, 2024) Naik, D.; Jaidhar, C.D.The principal objective of video/image captioning is to portray the dynamics of a video clip in plain natural language. Captioning is motivated by its ability to make the video more accessible to deaf and hard-of-hearing individuals, to help people focus on and recall information more readily, and to watch it in sound-sensitive locations. The most frequently utilized design paradigm is the revolutionary structurally improved encoder-decoder configuration. Recent developments emphasize the utilization of various creative structural modifications to maximize efficiency while demonstrating their viability in real-world applications. The utilization of well-known and well-researched technological advancements such as deep Convolutional Neural Networks (CNNs) and Sentence Transformers are trending in encoder-decoders. This paper proposes an approach for efficiently captioning videos using CNN and a short-connected LSTM-based encoder-decoder model blended with a sentence context vector. This sentence context vector emphasizes the relationship between the video and text spaces. Inspired by the human visual system, the attention mechanism is utilized to selectively concentrate on the context of the important frames. Also, a contextual hybrid embedding block is presented for connecting the two vector spaces generated during the encoding and decoding stages. The proposed architecture is investigated through well-known CNN architectures and various word embeddings. It is assessed using two benchmark video captioning datasets, MSVD and MSR-VTT, considering standard evaluation metrics such as BLEU, METEOR, ROUGH, and CIDEr. In accordance with experimental exploration, when the proposed model with NASNet-large alone is viewed across all three embeddings, the BERT findings on MSVD Dataset performed better than the results obtained with the other two embeddings. Inception-v4 outperformed VGG-16, ResNet-152, and NASNet-Large for feature extraction. Considering word embedding initiatives, BERT is far superior to ELMo and GloVe based on the MSR-VTT dataset. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
