Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 7 of 7
  • Item
    Modeling of Human Face Expressions and Hand Movement for Animation
    (Institute of Electrical and Electronics Engineers Inc., 2018) Sangeetha, G.S.; Koolagudi, S.G.; Ramteke, P.B.; Singala, S.; Sastry, S.
    Animation is the process of creating an illusion of motion by moving images rapidly in an order which minimally differ from each other. In this paper, a solution is proposed to simplify the process of animation by tracking movement of hand and facial expressions. Face detection performed using Haar-Cascade classifier whereas hand detection is achieved using Otsu's binarization and Ramer-Douglas-Peucker contour detection algorithm. Facial expression landmarks are captured from the Haarlike features. Hand movements feature points are extracted from the contour. Replay phase includes drawing the virtual object by calculating the translational factors and redrawing the virtual object in every frame during replay. The proposed approach is observed to achieve the smooth translation of face expression and hand movement and reduce the time and effort needed to make the animation. Copy Right © INDIACom-2018.
  • Item
    Performance evaluation of deep learning frameworks on computer vision problems
    (Institute of Electrical and Electronics Engineers Inc., 2019) Nara, M.; Mukesh, B.R.; Padala, P.; Kinnal, B.
    Deep Learning (DL) applications have skyrocketed in recent years and are being applied in various domains. There has been a tremendous surge in the development of DL frameworks to make implementation easier. In this paper, we aim to make a comparative study of GPU-accelerated deep learning software frameworks such as Torch and TenserFlow (with Keras API). We attempt to benchmark the performance of these frameworks by implementing three different neural networks, each designed for a popular Computer Vision problem (MNIST, CIFAR10, Fashion MNIST). We performed this experiment on both CPU and GPU(Nvidia GeForce GTX 960M) settings. The performance metrics used here include evaluation time, training time, and accuracy. This paper aims to act as a guide to selecting the most suitable framework for a particular problem. The special interest of the paper is to evaluate the performance lost due to the utility of an API like Keras and a comparative study of the performance over a user-defined neural network and a standard network. Our interest also lies in their performance when subjected to networks of different sizes. ©2019 IEEE.
  • Item
    Development of low-cost real-time driver drowsiness detection system using eye centre tracking and dynamic thresholding
    (Springer Verlag service@springer.de, 2020) Khan, F.; Sharma, S.
    One in every five vehicle accidents on the road today is caused simply due to driver fatigue. Fatigue or otherwise drowsiness, significantly reduces the concentration and vigilance of the driver thereby increasing the risk of inherent human error leading to injuries and fatalities. Hence, our primary motive being - to reduce road accidents using a non-intrusive image processing based alert system. In this regard, we have built a system that detects driver drowsiness by real time tracking and monitoring the pattern of the driver’s eyes. The stand alone system consists of 3 interconnected components - a processor, a camera and an alarm. After initial facial detection, the eyes are located, extracted and continuously monitored to check whether they are open or closed on the basis of a pixel-by-pixel method. When the eyes are seen to be closed for a certain amount of time, drowsiness is said to be detected and an alarm is issued accordingly to alert the driver and hence, prevent a casualty. © Springer Nature Switzerland AG 2020.
  • Item
    CNN-GRU: Transforming image into sentence using GRU and attention mechanism
    (Grenze Scientific Society, 2021) Saini, G.; Patil, N.
    Recent advancement of the deep neural network has triggered great attention in both Natural Language Processing (NLP) and Computer Vision (CV). It provides an efficient way of understanding semantic and syntactic structure which can deal with complex task such as automatic image captioning. Image captioning methodology mainly based on the encoder-decoder approach. In the present work, we developed a CNN-GRU model using Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU) and attention mechanism. Here VGG16 is used as an encoder, GRU and attention mechanism are used as a decoder. Our model has shown significant improvement compared to other state-of-art encoder-decoder models on the famous MSCOCO data set. Further, the time taken to train and test our model is two-third as compared to other similar models such as CNN-CNN and CNN-RNN. © Grenze Scientific Society, 2021.
  • Item
    Weaklier-Supervised Semantic Segmentation with Pyramid Scene Parsing Network
    (Institute of Electrical and Electronics Engineers Inc., 2021) Naik, D.; Jaidhar, C.D.
    Semantic image segmentation is the essential task of computer vision. It requires dividing visual input into different meaningful interpretable categories. In this work image attribution and segmentation approach is proposed. It can identify complex objects present in an image. The proposed model starts with superpixelization using Simple Linear Iterative Clustering (SLIC). A Multi Heat Map Slices Fusion model (MSF) produces an object seed heat map, and a Saliency Edge Colour Texture (SECT) model generates pixel-level annotations. Lastly, the PSPNet model for developing the final semantic segmentation of the object. The proposed model was implemented, and compared with the earlier work, it excelled the performance score. © 2021 IEEE.
  • Item
    Deep Learning Framework Based on Audio–Visual Features for Video Summarization
    (Springer Science and Business Media Deutschland GmbH, 2022) Rhevanth, M.; Ahmed, R.; Shah, V.; Mohan, B.R.
    The techniques of video summarization (VS) has garnered immense interests in current generation leading to enormous applications in different computer vision domains, such as video extraction, image captioning, indexing, and browsing. By the addition of high-quality features and clusters to pick representative visual elements, conventional VS studies often aim at the success of the VS algorithms. Many of the existing VS mechanisms only take into consideration the visual aspect of the video input, thereby ignoring the influence of audio features in the generated summary. To cope with such issues, we propose an efficient video summarization technique that processes both visual and audio content while extracting key frames from the raw video input. Structural similarity index is used to check similarity between the frames, while mel-frequency cepstral coefficient (MFCC) helps in extracting features from the corresponding audio signals. By combining the previous two features, the redundant frames of the video are removed. The resultant key frames are refined using a deep convolution neural network (CNN) model to retrieve a list of candidate key frames which finally constitute the summarization of the data. The proposed system is experimented on video datasets from YouTube that contain events within them which helps in better understanding the video summary. Experimental observations indicate that with the inclusion of audio features and an efficient refinement technique, followed by an optimization function, provides better summary results as compared to standard VS techniques. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
  • Item
    An Approach for Waste Classification Using Data Augmentation and Transfer Learning Models
    (Springer Science and Business Media Deutschland GmbH, 2023) Kumsetty, N.V.; Bhat Nekkare, A.B.; Kamath S․, S.; Anand Kumar, M.
    Waste segregation has become a daunting problem in the twenty-first century, as careless waste disposal manifests significant ecological and health concerns. Existing approaches to waste disposal primarily rely on incineration or land filling, neither of which are sustainable. Hence, responsible recycling and then adequate disposal is the optimal solution promoting both environment-friendly practices and reuse. In this paper, a computer vision-based approach for automated waste classification across multiple classes of waste products is proposed. We focus on improving the quality of existing datasets using data augmentation and image processing techniques. We also experiment with transfer learning based models such as ResNet and VGG for fast and accurate classification. The models were trained, validated, and tested on the benchmark TrashNet and TACO datasets. During experimental evaluation, the proposed model achieved 93.13% accuracy on TrashNet and outperformed state-of-the-art models by a margin of 16% on TACO. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.