Attention based Image Captioning using Depth-wise Separable Convolution

dc.contributor.authorMallick, V.R.
dc.contributor.authorNaik, D.
dc.date.accessioned2026-02-06T06:36:06Z
dc.date.issued2021
dc.description.abstractAutomatically generating descriptions for an image has been one of the trending topics in the field of Computer Vision. This is due to the fact that various real-life applications like self-driving cars, Google image search, etc. are dependent on it. The backbone of this work is the encoder-decoder architecture of deep learning. The basic image captioning model has CNN as an encoder and RNN as a decoder. Various deep CNNs like VGG-16 and VGG-19, ResNet, Inception have been explored but despite the comparatively better performance, Xception is not that familiar in this field. Again for the decoder, GRU is not been used much, despite being comparatively faster than LSTM. Keeping these things in mind, and being attracted by the accuracy of Xception and efficiency of GRU, we propose an architecture for image captioning task with Xception as encoder and GRU as decoder with an attention mechanism. © 2021 IEEE.
dc.identifier.citation2021 12th International Conference on Computing Communication and Networking Technologies, ICCCNT 2021, 2021, Vol., , p. -
dc.identifier.urihttps://doi.org/10.1109/ICCCNT51525.2021.9579512
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/30243
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.subjectAttention
dc.subjectBahdanau attention
dc.subjectConvolutional Neural Network [CNN]
dc.subjectDepth-wise Separable Convolution
dc.subjectGated Recurrent Unit [GRU]
dc.subjectInceptionV3
dc.subjectLong Short Term Memory [LSTM]
dc.subjectXception
dc.titleAttention based Image Captioning using Depth-wise Separable Convolution

Files