Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 4 of 4
  • Item
    COVID-19 Prediction Using Chest X-rays Images
    (Institute of Electrical and Electronics Engineers Inc., 2021) Kumar, A.; Sharma, N.; Naik, D.
    Understanding covid-19 became very important since large scale vaccination of this was not possible. Chest X-ray is the first imaging technique that plays an important role in the diagnosis of COVID-19 disease. Till now in various fields, great success has been achieved using convolutional neural networks(CNNs) for image recognition and classification. However, due to the limited availability of annotated medical images, the classification of medical images remains the biggest challenge in medical diagnosis. The proposed research work has performed transfer learning using deep learning models like Resnet50 and VGG16 and compare their performance with a newly developed CNN based model. Resnet50 and VGG16 are state of the art models and have been used extensively. A comparative analysis with them will give us an idea of how good our model is. Also, this research work develops a CNN model as it is expected to perform really good on image classification related problems. The proposed research work has used kaggle radiography dataset for training, validating and testing. Moreover, this research work has used another x-ray images dataset which have been created from two different sources. The result shows that the CNN model developed by us outperforms VGG16 and Resnet50 model. © 2021 IEEE.
  • Item
    Handwritten Text Recognition from an Image with Android Application
    (Institute of Electrical and Electronics Engineers Inc., 2022) Mule, H.; Kadam, N.; Naik, D.
    Nowadays, Storing information from handwritten documents for future use is becoming necessary. An easy way to store information is to capture handwritten documents and save them in image format. Recognizing the text or characters present in the image is called Optical Character Recognition. Text extraction from the image in the recent research is challenging due to stroke variation, inconsistent writing style, Cursive handwriting, etc. We have proposed CNN and BiLSTM models for text recognition in this work. This model is evaluated on the IAM dataset and achieved 92% character recognition accuracy. This model is deployed to the Firebase as a custom model to increase usability. We have developed an android application that will allow the user to capture or browse the image and extract the text from the picture by calling the firebase model and saving text in the file. To store the text file user can browse for the appropriate location. The proposed model works on both printed and handwritten text. © 2022 IEEE.
  • Item
    Visual Question Answering Using Convolutional and Recurrent Neural Networks
    (Springer Science and Business Media Deutschland GmbH, 2023) Azade, A.; Saini, R.; Naik, D.
    This paper presents a methodology that deals with the task of generating answers corresponding to the respective questions which are based on the input images in the dataset. The model proposed in this methodology constitutes two major components and then integration of analysis results and features from these components to form a combination in order to predict the answers. We have created a pipeline that first preprocesses the dataset and then encodes the question string and answer string. Using NLP techniques like tokenization and stemming, text data is processed to form a vocabulary set. Yet another experiment with modification in model and approach was performed using easy-VQA dataset which is available publically. This model used the bag of words technique to turn a question into a vector. This approach considered two components separately for text and image feature extraction and merged it to form analysis and generate an answer. Merge is done by using element-wise multiplication. In these approaches, we have used the softmax activation function in the output layer to generate output or answer to the question. When compared to existing methodologies this approach seems comparable and gives decent results. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
  • Item
    Video to Text Generation Using Sentence Vector and Skip Connections
    (Springer, 2023) Mule, H.; Naik, D.
    Nowadays, video data is increasing rapidly and the need of robust algorithms to process the interpretation of the video. A textual alternative will be more effective and save time. We aim to produce the caption for the video. The most famous architecture used for this is the encoder-decoder (E-D) model. Recent attempts have focused on improving performance by including 3D-CNN, transformers, or structural changes in the basic LSTM units used in E-D. Sentence vectors are used in this work, improving the E-D model’s performance. From the video file, a sentence vector is generated and used by the decoder to generate an accurate description by using previously generated words. Skip connection in the encoder part avoids the vanishing gradients problem. All of our studies use the MSVD and CHARADES datasets. Four famous metrics, BLEU@4, METEOR, ROUGE, and CIDER, are used for performance evaluation. We have compared the performance of BERT, ELMo, and GloVe word embeddings. On experimental analysis, BERT embedding outperformed the ELMo and GloVe embeddings. For feature extraction, pretrained CNNs, NASNet-Large, VGG-16, Inception-v4, and Resnet152 are used, and NASNet-Large outperformed other models. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.