Attention based Image Captioning using Depth-wise Separable Convolution
No Thumbnail Available
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc.
Abstract
Automatically generating descriptions for an image has been one of the trending topics in the field of Computer Vision. This is due to the fact that various real-life applications like self-driving cars, Google image search, etc. are dependent on it. The backbone of this work is the encoder-decoder architecture of deep learning. The basic image captioning model has CNN as an encoder and RNN as a decoder. Various deep CNNs like VGG-16 and VGG-19, ResNet, Inception have been explored but despite the comparatively better performance, Xception is not that familiar in this field. Again for the decoder, GRU is not been used much, despite being comparatively faster than LSTM. Keeping these things in mind, and being attracted by the accuracy of Xception and efficiency of GRU, we propose an architecture for image captioning task with Xception as encoder and GRU as decoder with an attention mechanism. © 2021 IEEE.
Description
Keywords
Attention, Bahdanau attention, Convolutional Neural Network [CNN], Depth-wise Separable Convolution, Gated Recurrent Unit [GRU], InceptionV3, Long Short Term Memory [LSTM], Xception
Citation
2021 12th International Conference on Computing Communication and Networking Technologies, ICCCNT 2021, 2021, Vol., , p. -
