Attention based Image Captioning using Depth-wise Separable Convolution

No Thumbnail Available

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers Inc.

Abstract

Automatically generating descriptions for an image has been one of the trending topics in the field of Computer Vision. This is due to the fact that various real-life applications like self-driving cars, Google image search, etc. are dependent on it. The backbone of this work is the encoder-decoder architecture of deep learning. The basic image captioning model has CNN as an encoder and RNN as a decoder. Various deep CNNs like VGG-16 and VGG-19, ResNet, Inception have been explored but despite the comparatively better performance, Xception is not that familiar in this field. Again for the decoder, GRU is not been used much, despite being comparatively faster than LSTM. Keeping these things in mind, and being attracted by the accuracy of Xception and efficiency of GRU, we propose an architecture for image captioning task with Xception as encoder and GRU as decoder with an attention mechanism. © 2021 IEEE.

Description

Keywords

Attention, Bahdanau attention, Convolutional Neural Network [CNN], Depth-wise Separable Convolution, Gated Recurrent Unit [GRU], InceptionV3, Long Short Term Memory [LSTM], Xception

Citation

2021 12th International Conference on Computing Communication and Networking Technologies, ICCCNT 2021, 2021, Vol., , p. -

Endorsement

Review

Supplemented By

Referenced By