Attention based Image Captioning using Depth-wise Separable Convolution
| dc.contributor.author | Mallick, V.R. | |
| dc.contributor.author | Naik, D. | |
| dc.date.accessioned | 2026-02-06T06:36:06Z | |
| dc.date.issued | 2021 | |
| dc.description.abstract | Automatically generating descriptions for an image has been one of the trending topics in the field of Computer Vision. This is due to the fact that various real-life applications like self-driving cars, Google image search, etc. are dependent on it. The backbone of this work is the encoder-decoder architecture of deep learning. The basic image captioning model has CNN as an encoder and RNN as a decoder. Various deep CNNs like VGG-16 and VGG-19, ResNet, Inception have been explored but despite the comparatively better performance, Xception is not that familiar in this field. Again for the decoder, GRU is not been used much, despite being comparatively faster than LSTM. Keeping these things in mind, and being attracted by the accuracy of Xception and efficiency of GRU, we propose an architecture for image captioning task with Xception as encoder and GRU as decoder with an attention mechanism. © 2021 IEEE. | |
| dc.identifier.citation | 2021 12th International Conference on Computing Communication and Networking Technologies, ICCCNT 2021, 2021, Vol., , p. - | |
| dc.identifier.uri | https://doi.org/10.1109/ICCCNT51525.2021.9579512 | |
| dc.identifier.uri | https://idr.nitk.ac.in/handle/123456789/30243 | |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | |
| dc.subject | Attention | |
| dc.subject | Bahdanau attention | |
| dc.subject | Convolutional Neural Network [CNN] | |
| dc.subject | Depth-wise Separable Convolution | |
| dc.subject | Gated Recurrent Unit [GRU] | |
| dc.subject | InceptionV3 | |
| dc.subject | Long Short Term Memory [LSTM] | |
| dc.subject | Xception | |
| dc.title | Attention based Image Captioning using Depth-wise Separable Convolution |
