Comparison of Image Encoder Architectures for Image Captioning

Maru, H.; Chandana, T.S.S.; Naik, D.

Comparison of Image Encoder Architectures for Image Captioning

dc.contributor.author	Maru, H.
dc.contributor.author	Chandana, T.S.S.
dc.contributor.author	Naik, D.
dc.date.accessioned	2026-02-06T06:35:56Z
dc.date.issued	2021
dc.description.abstract	Image captioning is a fascinating and challenging task combining two interesting fields in computer science - Natural Language Processing and Computer Vision. Several approaches have been tried for this task. Many of these approaches are based on a combination of Convolutional Neural Networks(CNN's) as encoders and Recurrent Neural Networks(RNN's) as decoders. This paper compares two CNN-based encoders - (Vector Geometry Group) VGG16 and InceptionV3, which are used as encoders for encoding the input image features. The proposed research work uses the popular Flickr8k dataset and also compares among two different loss functions used with each of the above architectures- Categorical Cross Entropy loss function and Kullback-Leibler Divergence. Our main goal is to study the effect of the image encoder and the loss function on the image captioning task while keeping all other parameters the same. Â© 2021 IEEE.
dc.identifier.citation	Proceedings - 5th International Conference on Computing Methodologies and Communication, ICCMC 2021, 2021, Vol., , p. 740-744
dc.identifier.uri	https://doi.org/10.1109/ICCMC51019.2021.9418234
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/30150
dc.publisher	Institute of Electrical and Electronics Engineers Inc.
dc.subject	(Vector Geometry Group) VGG16
dc.subject	Convolutional Neural Networks(CNN)
dc.subject	Encoder-decoder framework
dc.subject	Image captioning
dc.subject	InceptionV3
dc.title	Comparison of Image Encoder Architectures for Image Captioning

Collections

Conference Papers

Comparison of Image Encoder Architectures for Image Captioning

Files

Collections