Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 3 of 3
  • Item
    COVID-19 Prediction Using Chest X-rays Images
    (Institute of Electrical and Electronics Engineers Inc., 2021) Kumar, A.; Sharma, N.; Naik, D.
    Understanding covid-19 became very important since large scale vaccination of this was not possible. Chest X-ray is the first imaging technique that plays an important role in the diagnosis of COVID-19 disease. Till now in various fields, great success has been achieved using convolutional neural networks(CNNs) for image recognition and classification. However, due to the limited availability of annotated medical images, the classification of medical images remains the biggest challenge in medical diagnosis. The proposed research work has performed transfer learning using deep learning models like Resnet50 and VGG16 and compare their performance with a newly developed CNN based model. Resnet50 and VGG16 are state of the art models and have been used extensively. A comparative analysis with them will give us an idea of how good our model is. Also, this research work develops a CNN model as it is expected to perform really good on image classification related problems. The proposed research work has used kaggle radiography dataset for training, validating and testing. Moreover, this research work has used another x-ray images dataset which have been created from two different sources. The result shows that the CNN model developed by us outperforms VGG16 and Resnet50 model. © 2021 IEEE.
  • Item
    Loss Optimised Video Captioning using Deep-LSTM, Attention Mechanism and Weighted Loss Metrices
    (Institute of Electrical and Electronics Engineers Inc., 2021) Yadav, N.; Naik, D.
    The aim of the video captioning task is to use multiple natural-language sentences to define video content. Photographic, graphical, and auditory data are all used in the videos. Our goal is to investigate and recognize the video's visual features, as well as to create a caption so that anyone can get the video's information within a second. Despite the fact, that phase encoder-decoder models have made significant progress, but it still needs many improvements. In the present work, we enhanced the top-down architecture using Bahdanau Attention, Deep-Long Short-Term Memory (Deep-LSTM) and weighted loss function. VGG16 is used to extract the features from the frames. To understand the actions in the video, Deep-LSTM is paired with an attention system. On the MSVD dataset, we analysed the efficiency of our model, which indicates a major improvement over the other state-of-art model. © 2021 IEEE.
  • Item
    Generating Short Video Description using Deep-LSTM and Attention Mechanism
    (Institute of Electrical and Electronics Engineers Inc., 2021) Yadav, N.; Naik, D.
    In modern days, extensive amount of data is produced from videos, because most of the populations have video capturing devices such as mobile phone, camera, etc. The video comprises of photographic data, textual data, and auditory data. Our aim is to investigate and recognize the visual feature of the video and to generate the caption so that users can get the information of the video in an instant of time. Many technologies capture static content of the frame but for video captioning, dynamic information is more important compared to static information. In this work, we introduced an Encoder-Decoder architecture using Deep-Long Short-Term Memory (Deep-LSTM) and Bahdanau Attention. In the encoder, Convolution Neural Network (CNN) VGG16 and Deep-LSTM are used for deducing information from frames and Deep-LSTM combined with attention mechanism for describing action performed in the video. We evaluated the performance of our model on MSVD dataset, which shows significant improvement as compared to the other video captioning model. © 2021 IEEE.