Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
16 results
Search Results
Item Modeling of Human Face Expressions and Hand Movement for Animation(Institute of Electrical and Electronics Engineers Inc., 2018) Sangeetha, G.S.; Koolagudi, S.G.; Ramteke, P.B.; Singala, S.; Sastry, S.Animation is the process of creating an illusion of motion by moving images rapidly in an order which minimally differ from each other. In this paper, a solution is proposed to simplify the process of animation by tracking movement of hand and facial expressions. Face detection performed using Haar-Cascade classifier whereas hand detection is achieved using Otsu's binarization and Ramer-Douglas-Peucker contour detection algorithm. Facial expression landmarks are captured from the Haarlike features. Hand movements feature points are extracted from the contour. Replay phase includes drawing the virtual object by calculating the translational factors and redrawing the virtual object in every frame during replay. The proposed approach is observed to achieve the smooth translation of face expression and hand movement and reduce the time and effort needed to make the animation. Copy Right © INDIACom-2018.Item Performance evaluation of deep learning frameworks on computer vision problems(Institute of Electrical and Electronics Engineers Inc., 2019) Nara, M.; Mukesh, B.R.; Padala, P.; Kinnal, B.Deep Learning (DL) applications have skyrocketed in recent years and are being applied in various domains. There has been a tremendous surge in the development of DL frameworks to make implementation easier. In this paper, we aim to make a comparative study of GPU-accelerated deep learning software frameworks such as Torch and TenserFlow (with Keras API). We attempt to benchmark the performance of these frameworks by implementing three different neural networks, each designed for a popular Computer Vision problem (MNIST, CIFAR10, Fashion MNIST). We performed this experiment on both CPU and GPU(Nvidia GeForce GTX 960M) settings. The performance metrics used here include evaluation time, training time, and accuracy. This paper aims to act as a guide to selecting the most suitable framework for a particular problem. The special interest of the paper is to evaluate the performance lost due to the utility of an API like Keras and a comparative study of the performance over a user-defined neural network and a standard network. Our interest also lies in their performance when subjected to networks of different sizes. ©2019 IEEE.Item Development of low-cost real-time driver drowsiness detection system using eye centre tracking and dynamic thresholding(Springer Verlag service@springer.de, 2020) Khan, F.; Sharma, S.One in every five vehicle accidents on the road today is caused simply due to driver fatigue. Fatigue or otherwise drowsiness, significantly reduces the concentration and vigilance of the driver thereby increasing the risk of inherent human error leading to injuries and fatalities. Hence, our primary motive being - to reduce road accidents using a non-intrusive image processing based alert system. In this regard, we have built a system that detects driver drowsiness by real time tracking and monitoring the pattern of the driver’s eyes. The stand alone system consists of 3 interconnected components - a processor, a camera and an alarm. After initial facial detection, the eyes are located, extracted and continuously monitored to check whether they are open or closed on the basis of a pixel-by-pixel method. When the eyes are seen to be closed for a certain amount of time, drowsiness is said to be detected and an alarm is issued accordingly to alert the driver and hence, prevent a casualty. © Springer Nature Switzerland AG 2020.Item CNN-GRU: Transforming image into sentence using GRU and attention mechanism(Grenze Scientific Society, 2021) Saini, G.; Patil, N.Recent advancement of the deep neural network has triggered great attention in both Natural Language Processing (NLP) and Computer Vision (CV). It provides an efficient way of understanding semantic and syntactic structure which can deal with complex task such as automatic image captioning. Image captioning methodology mainly based on the encoder-decoder approach. In the present work, we developed a CNN-GRU model using Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU) and attention mechanism. Here VGG16 is used as an encoder, GRU and attention mechanism are used as a decoder. Our model has shown significant improvement compared to other state-of-art encoder-decoder models on the famous MSCOCO data set. Further, the time taken to train and test our model is two-third as compared to other similar models such as CNN-CNN and CNN-RNN. © Grenze Scientific Society, 2021.Item Weaklier-Supervised Semantic Segmentation with Pyramid Scene Parsing Network(Institute of Electrical and Electronics Engineers Inc., 2021) Naik, D.; Jaidhar, C.D.Semantic image segmentation is the essential task of computer vision. It requires dividing visual input into different meaningful interpretable categories. In this work image attribution and segmentation approach is proposed. It can identify complex objects present in an image. The proposed model starts with superpixelization using Simple Linear Iterative Clustering (SLIC). A Multi Heat Map Slices Fusion model (MSF) produces an object seed heat map, and a Saliency Edge Colour Texture (SECT) model generates pixel-level annotations. Lastly, the PSPNet model for developing the final semantic segmentation of the object. The proposed model was implemented, and compared with the earlier work, it excelled the performance score. © 2021 IEEE.Item Deep Learning Framework Based on Audio–Visual Features for Video Summarization(Springer Science and Business Media Deutschland GmbH, 2022) Rhevanth, M.; Ahmed, R.; Shah, V.; Mohan, B.R.The techniques of video summarization (VS) has garnered immense interests in current generation leading to enormous applications in different computer vision domains, such as video extraction, image captioning, indexing, and browsing. By the addition of high-quality features and clusters to pick representative visual elements, conventional VS studies often aim at the success of the VS algorithms. Many of the existing VS mechanisms only take into consideration the visual aspect of the video input, thereby ignoring the influence of audio features in the generated summary. To cope with such issues, we propose an efficient video summarization technique that processes both visual and audio content while extracting key frames from the raw video input. Structural similarity index is used to check similarity between the frames, while mel-frequency cepstral coefficient (MFCC) helps in extracting features from the corresponding audio signals. By combining the previous two features, the redundant frames of the video are removed. The resultant key frames are refined using a deep convolution neural network (CNN) model to retrieve a list of candidate key frames which finally constitute the summarization of the data. The proposed system is experimented on video datasets from YouTube that contain events within them which helps in better understanding the video summary. Experimental observations indicate that with the inclusion of audio features and an efficient refinement technique, followed by an optimization function, provides better summary results as compared to standard VS techniques. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.Item An Approach for Waste Classification Using Data Augmentation and Transfer Learning Models(Springer Science and Business Media Deutschland GmbH, 2023) Kumsetty, N.V.; Bhat Nekkare, A.B.; Kamath S․, S.; Anand Kumar, M.Waste segregation has become a daunting problem in the twenty-first century, as careless waste disposal manifests significant ecological and health concerns. Existing approaches to waste disposal primarily rely on incineration or land filling, neither of which are sustainable. Hence, responsible recycling and then adequate disposal is the optimal solution promoting both environment-friendly practices and reuse. In this paper, a computer vision-based approach for automated waste classification across multiple classes of waste products is proposed. We focus on improving the quality of existing datasets using data augmentation and image processing techniques. We also experiment with transfer learning based models such as ResNet and VGG for fast and accurate classification. The models were trained, validated, and tested on the benchmark TrashNet and TACO datasets. During experimental evaluation, the proposed model achieved 93.13% accuracy on TrashNet and outperformed state-of-the-art models by a margin of 16% on TACO. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.Item Optimization of countour based template matching using GPGPU based hexagonal framework(Machine Intelligence Research (MIR) Labs contact@mirlabs.org, 2015) Bhagya, M.; Tripathi, S.; Santhi Thilagam, P.This paper presents a technique to optimize contour based template matching by using general purpose computation on graphics processing units (GPGPU). Contour based template matching requires edge detection and searching for presence of a template in an entire image, real time implementation of which is not trivial. Using the proposed solution, we could achieve an implementation fast enough to process a standard video (640x480) in real time with sufficient accuracy.Item An efficient cuckoo search algorithm based multilevel thresholding for segmentation of satellite images using different objective functions(Elsevier Ltd, 2016) Suresh, S.; Lal, S.Satellite image segmentation is challenging due to the presence of weakly correlated and ambiguous multiple regions of interest. Several bio-inspired algorithms were developed to generate optimum threshold values for segmenting such images efficiently. Their exhaustive search nature makes them computationally expensive when extended to multilevel thresholding. In this paper, we propose a computationally efficient image segmentation algorithm, called CSMcCulloch, incorporating McCulloch's method for lévy flight generation in Cuckoo Search (CS) algorithm. We have also investigated the impact of Mantegna?s method forlévy flight generation in CS algorithm (CSMantegna) by comparing it with the conventional CS algorithm which uses the simplified version of the same. CSMantegna algorithm resulted in improved segmentation quality with an expense of computational time. The performance of the proposed CSMcCulloch algorithm is compared with other bio-inspired algorithms such as Particle Swarm Optimization (PSO) algorithm, Darwinian Particle Swarm Optimization (DPSO) algorithm, Artificial Bee Colony (ABC) algorithm, Cuckoo Search (CS) algorithm and CSMantegna algorithm using Otsu's method, Kapur entropy and Tsallis entropy as objective functions. Experimental results were validated by measuring PSNR, MSE, FSIM and CPU running time for all the cases investigated. The proposed CSMcCulloch algorithm evolved to be most promising, and computationally efficient for segmenting satellite images. Convergence rate analysis also reveals that the proposed algorithm outperforms others in attaining stable global optimum thresholds. The experiments results encourages related researches in computer vision, remote sensing and image processing applications. © 2016 Elsevier Ltd. All rights reserved.Item Fast interactive superpixel based image region generation(Blue Eyes Intelligence Engineering and Sciences Publication, 2019) Naik, D.; Muhammed Shameem, P.K.Image Segmentation has always been a problem of interest and the challenging task in the field of the computer-based vision system. It plays a vital role in the field of object detection and recognition. Identifying with separating a part of the interest from a complicated image is easy for the human vision system, but the same is cumbersome to automate. The proposed work is a novel combined technology for fast segmentation of foreground (area of interest) out of an image which possesses a background and other complications. This work utilizes the latest industrial class technologies with advanced algorithms. Our approach remarkably increased the performance by working on a super-pixelated image rather than a normal n x n pixel image. The proposed work is mainly focused on interactive segmentation. This could be actively used in fields like the medical analysis. Our segmentation technique is a binary segmentation where it classifies pixels into two distinct sets. The proposed scheme is experimentally shown to compare favorably with contemporary interactive image segmentation schemes when applied to colored and gray-scale images. © BEIESP.
