Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
Search Results
Item Attention based Image Captioning using Depth-wise Separable Convolution(Institute of Electrical and Electronics Engineers Inc., 2021) Mallick, V.R.; Naik, D.Automatically generating descriptions for an image has been one of the trending topics in the field of Computer Vision. This is due to the fact that various real-life applications like self-driving cars, Google image search, etc. are dependent on it. The backbone of this work is the encoder-decoder architecture of deep learning. The basic image captioning model has CNN as an encoder and RNN as a decoder. Various deep CNNs like VGG-16 and VGG-19, ResNet, Inception have been explored but despite the comparatively better performance, Xception is not that familiar in this field. Again for the decoder, GRU is not been used much, despite being comparatively faster than LSTM. Keeping these things in mind, and being attracted by the accuracy of Xception and efficiency of GRU, we propose an architecture for image captioning task with Xception as encoder and GRU as decoder with an attention mechanism. © 2021 IEEE.Item Comparitive Study of GRU and LSTM Cells Based Video Captioning Models(Institute of Electrical and Electronics Engineers Inc., 2021) Maru, H.; Chandana, T.S.S.; Naik, D.Video Captioning task involves generating descriptive text for the events and objects in the videos. It mainly involves taking a video, which is nothing but a sequence of frames, as data from the user and giving a single or multiple sentences (sequence of words) to the user. A lot of research has been done in the area of video captioning. Most of this work is based on using Long Short Term Memory (LSTM) units for avoiding the vanishing gradients problem. In this work, we purpose to implement a video captioning model using Gated Recurrent Units(GRU's), attention mechanism and word embeddings and compare the functionalities and results with traditional models that use LSTM's or Recurrent Neural Networks(RNN's). We train and test our model on the standard MSVD (Microsoft Research Video Description Corpus) dataset. We use a wide range of performance metrics like BLEU score, METEOR score, ROUGE-1, ROUGE-2 and ROUGE-L to evaluate the performance. © 2021 IEEE.Item Effect of Batch Normalization and Stacked LSTMs on Video Captioning(Institute of Electrical and Electronics Engineers Inc., 2021) Sarathi, V.; Mujumdar, A.; Naik, D.Integration of visual content with natural language for generating images or video description has been a challenging task for many years. Recent research in image captioning using Long Short term memory (LSTM) recently has motivated its possible application in video captioning where a video is converted into an array of frames, or images, and this array along with the captions for the video are used to train the LSTM network to associate the video with sentences. However very little is known about using fine tuning techniques such as batch normalization or Stacked LSTMs models in video captioning and how it affects the performance of the model.For this project, we want to compare the performance of the base model described in [1] with batch normalization and stacked LSTMs with base model as our reference. © 2021 IEEE.Item Image Captioning with Attention Based Model(Institute of Electrical and Electronics Engineers Inc., 2021) Yv, S.S.; Choubey, Y.; Naik, D.Defining the content of an image automatically in Artificial Intelligence is basically a rudimentary problem that connects computer vision and NLP (Natural Language Processing). In the proposed work, a generative model is presented by combining the recent developments in machine learning and computer vision based on a deep recurrent architecture that describes the image using natural language phrases. By integrating the training picture, the trained model maximizes the likelihood of the target description sentence. The efficiency of the model, its accuracy and the language it learns is only dependent on the image descriptions, which was demonstrated by experiments performed on several datasets. © 2021 IEEE.Item Describing Image with Attention based GRU(Institute of Electrical and Electronics Engineers Inc., 2021) Mallick, V.R.; Naik, D.Generating descriptions for images are popular research topic in current world. Based on encoder-decoder model, CNN works as an encoder to encode the images and then passes it to decoder RNN as input to generate the image description in natural language sentences. LSTM is widely used as RNN decoder. Attention mechanism has also played an important role in this field by enhancing the object detection. Inspired by this recent advancement in this field of computer vision, we used GRU in place of LSTM as a decoder for our image captioning model. We incorporated attention mechanism with GRU decoder to enhance the precision of generated captions. GRU have lesser tensor operations in comparison to LSTM, hence it will be faster in training. © 2021 IEEE.
