Visual Question Answering Using Convolutional and Recurrent Neural Networks

No Thumbnail Available

Date

2023

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Science and Business Media Deutschland GmbH

Abstract

This paper presents a methodology that deals with the task of generating answers corresponding to the respective questions which are based on the input images in the dataset. The model proposed in this methodology constitutes two major components and then integration of analysis results and features from these components to form a combination in order to predict the answers. We have created a pipeline that first preprocesses the dataset and then encodes the question string and answer string. Using NLP techniques like tokenization and stemming, text data is processed to form a vocabulary set. Yet another experiment with modification in model and approach was performed using easy-VQA dataset which is available publically. This model used the bag of words technique to turn a question into a vector. This approach considered two components separately for text and image feature extraction and merged it to form analysis and generate an answer. Merge is done by using element-wise multiplication. In these approaches, we have used the softmax activation function in the output layer to generate output or answer to the question. When compared to existing methodologies this approach seems comparable and gives decent results. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

Description

Keywords

CNN, Question-answering, VQA

Citation

Lecture Notes in Electrical Engineering, 2023, Vol.998 LNEE, , p. 23-33

Endorsement

Review

Supplemented By

Referenced By