Azade, A.Saini, R.Naik, D.2026-02-062023Lecture Notes in Electrical Engineering, 2023, Vol.998 LNEE, , p. 23-3318761100https://doi.org/10.1007/978-981-99-0047-3_3https://idr.nitk.ac.in/handle/123456789/29565This paper presents a methodology that deals with the task of generating answers corresponding to the respective questions which are based on the input images in the dataset. The model proposed in this methodology constitutes two major components and then integration of analysis results and features from these components to form a combination in order to predict the answers. We have created a pipeline that first preprocesses the dataset and then encodes the question string and answer string. Using NLP techniques like tokenization and stemming, text data is processed to form a vocabulary set. Yet another experiment with modification in model and approach was performed using easy-VQA dataset which is available publically. This model used the bag of words technique to turn a question into a vector. This approach considered two components separately for text and image feature extraction and merged it to form analysis and generate an answer. Merge is done by using element-wise multiplication. In these approaches, we have used the softmax activation function in the output layer to generate output or answer to the question. When compared to existing methodologies this approach seems comparable and gives decent results. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.CNNQuestion-answeringVQAVisual Question Answering Using Convolutional and Recurrent Neural Networks