Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
Search Results
Item Comparitive Study of GRU and LSTM Cells Based Video Captioning Models(Institute of Electrical and Electronics Engineers Inc., 2021) Maru, H.; Chandana, T.S.S.; Naik, D.Video Captioning task involves generating descriptive text for the events and objects in the videos. It mainly involves taking a video, which is nothing but a sequence of frames, as data from the user and giving a single or multiple sentences (sequence of words) to the user. A lot of research has been done in the area of video captioning. Most of this work is based on using Long Short Term Memory (LSTM) units for avoiding the vanishing gradients problem. In this work, we purpose to implement a video captioning model using Gated Recurrent Units(GRU's), attention mechanism and word embeddings and compare the functionalities and results with traditional models that use LSTM's or Recurrent Neural Networks(RNN's). We train and test our model on the standard MSVD (Microsoft Research Video Description Corpus) dataset. We use a wide range of performance metrics like BLEU score, METEOR score, ROUGE-1, ROUGE-2 and ROUGE-L to evaluate the performance. © 2021 IEEE.Item Comparison of Image Encoder Architectures for Image Captioning(Institute of Electrical and Electronics Engineers Inc., 2021) Maru, H.; Chandana, T.S.S.; Naik, D.Image captioning is a fascinating and challenging task combining two interesting fields in computer science - Natural Language Processing and Computer Vision. Several approaches have been tried for this task. Many of these approaches are based on a combination of Convolutional Neural Networks(CNN's) as encoders and Recurrent Neural Networks(RNN's) as decoders. This paper compares two CNN-based encoders - (Vector Geometry Group) VGG16 and InceptionV3, which are used as encoders for encoding the input image features. The proposed research work uses the popular Flickr8k dataset and also compares among two different loss functions used with each of the above architectures- Categorical Cross Entropy loss function and Kullback-Leibler Divergence. Our main goal is to study the effect of the image encoder and the loss function on the image captioning task while keeping all other parameters the same. © 2021 IEEE.
