Generating Short Video Description using Deep-LSTM and Attention Mechanism

dc.contributor.authorYadav, N.
dc.contributor.authorNaik, D.
dc.date.accessioned2026-02-06T06:35:56Z
dc.date.issued2021
dc.description.abstractIn modern days, extensive amount of data is produced from videos, because most of the populations have video capturing devices such as mobile phone, camera, etc. The video comprises of photographic data, textual data, and auditory data. Our aim is to investigate and recognize the visual feature of the video and to generate the caption so that users can get the information of the video in an instant of time. Many technologies capture static content of the frame but for video captioning, dynamic information is more important compared to static information. In this work, we introduced an Encoder-Decoder architecture using Deep-Long Short-Term Memory (Deep-LSTM) and Bahdanau Attention. In the encoder, Convolution Neural Network (CNN) VGG16 and Deep-LSTM are used for deducing information from frames and Deep-LSTM combined with attention mechanism for describing action performed in the video. We evaluated the performance of our model on MSVD dataset, which shows significant improvement as compared to the other video captioning model. © 2021 IEEE.
dc.identifier.citation2021 6th International Conference for Convergence in Technology, I2CT 2021, 2021, Vol., , p. -
dc.identifier.urihttps://doi.org/10.1109/I2CT51068.2021.9417907
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/30156
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.subjectComputer Vision
dc.subjectMachine Translation
dc.subjectNatural Language Processing
dc.subjectRecurrent Neural Network
dc.subjectVideo Captioning
dc.titleGenerating Short Video Description using Deep-LSTM and Attention Mechanism

Files