Video to Text Generation Using Sentence Vector and Skip Connections

Mule, H.; Naik, D.

Video to Text Generation Using Sentence Vector and Skip Connections

dc.contributor.author	Mule, H.
dc.contributor.author	Naik, D.
dc.date.accessioned	2026-02-06T06:34:51Z
dc.date.issued	2023
dc.description.abstract	Nowadays, video data is increasing rapidly and the need of robust algorithms to process the interpretation of the video. A textual alternative will be more effective and save time. We aim to produce the caption for the video. The most famous architecture used for this is the encoder-decoder (E-D) model. Recent attempts have focused on improving performance by including 3D-CNN, transformers, or structural changes in the basic LSTM units used in E-D. Sentence vectors are used in this work, improving the E-D modelâ€™s performance. From the video file, a sentence vector is generated and used by the decoder to generate an accurate description by using previously generated words. Skip connection in the encoder part avoids the vanishing gradients problem. All of our studies use the MSVD and CHARADES datasets. Four famous metrics, BLEU@4, METEOR, ROUGE, and CIDER, are used for performance evaluation. We have compared the performance of BERT, ELMo, and GloVe word embeddings. On experimental analysis, BERT embedding outperformed the ELMo and GloVe embeddings. For feature extraction, pretrained CNNs, NASNet-Large, VGG-16, Inception-v4, and Resnet152 are used, and NASNet-Large outperformed other models. Â© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
dc.identifier.citation	Springer Proceedings in Mathematics and Statistics, 2023, Vol.401, , p. 515-527
dc.identifier.issn	21941009
dc.identifier.uri	https://doi.org/10.1007/978-3-031-15175-0_42
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/29506
dc.publisher	Springer
dc.subject	BiLSTM
dc.subject	CNN
dc.subject	Sentence vector
dc.subject	Skip-connection
dc.subject	Video captioning
dc.title	Video to Text Generation Using Sentence Vector and Skip Connections

Collections

Conference Papers

Video to Text Generation Using Sentence Vector and Skip Connections

Files

Collections