Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 2 of 2

Attention based Image Captioning using Depth-wise Separable Convolution
(Institute of Electrical and Electronics Engineers Inc., 2021) Mallick, V.R.; Naik, D.
Automatically generating descriptions for an image has been one of the trending topics in the field of Computer Vision. This is due to the fact that various real-life applications like self-driving cars, Google image search, etc. are dependent on it. The backbone of this work is the encoder-decoder architecture of deep learning. The basic image captioning model has CNN as an encoder and RNN as a decoder. Various deep CNNs like VGG-16 and VGG-19, ResNet, Inception have been explored but despite the comparatively better performance, Xception is not that familiar in this field. Again for the decoder, GRU is not been used much, despite being comparatively faster than LSTM. Keeping these things in mind, and being attracted by the accuracy of Xception and efficiency of GRU, we propose an architecture for image captioning task with Xception as encoder and GRU as decoder with an attention mechanism. Â© 2021 IEEE.
Describing Image with Attention based GRU
(Institute of Electrical and Electronics Engineers Inc., 2021) Mallick, V.R.; Naik, D.
Generating descriptions for images are popular research topic in current world. Based on encoder-decoder model, CNN works as an encoder to encode the images and then passes it to decoder RNN as input to generate the image description in natural language sentences. LSTM is widely used as RNN decoder. Attention mechanism has also played an important role in this field by enhancing the object detection. Inspired by this recent advancement in this field of computer vision, we used GRU in place of LSTM as a decoder for our image captioning model. We incorporated attention mechanism with GRU decoder to enhance the precision of generated captions. GRU have lesser tensor operations in comparison to LSTM, hence it will be faster in training. Â© 2021 IEEE.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results