A Novel Approach for Video Captioning Based on Semantic Cross Embedding and Skip-Connection

dc.contributor.authorRadarapu, R.
dc.contributor.authorBandari, N.
dc.contributor.authorMuthyam, S.
dc.contributor.authorNaik, D.
dc.date.accessioned2026-02-06T06:36:17Z
dc.date.issued2021
dc.description.abstractVideo Captioning is the task of describing the content of a video in simple natural language. Encoder-Decoder architecture is the most widely used architecture for this task. Recent works exploit the use of 3D Convolutional Neural Networks (CNNs), Transformers or by changing the structure of basic Long Short-Term Memory (LSTM) units used in Encoder-Decoder to improve the performance. In this paper, we propose the use of a sentence vector to improve the performance of the Encoder-Decoder model. This sentence vector acts as an intermediary between the video space and the text space. Thus, it is referred to as semantic cross embedding that bridges the two vector spaces, in this paper. The sentence vector is generated from the video and is used by the Decoder, along with previously generated words to generate a suitable description. We also employ the use of a skip-connection in the Encoder part of the model. Skip-connection is usually employed to tackle the vanishing gradients problem in deep neural networks. However, our experiments show that a two-layer LSTM with a skip-connection performs better than the Bidirectional LSTM, for our model. Also, the use of a sentence vector improves performance considerably. All our experiments are performed on the MSVD dataset. © 2021, Springer Nature Singapore Pte Ltd.
dc.identifier.citationCommunications in Computer and Information Science, 2021, Vol.1378 CCIS, , p. 465-477
dc.identifier.issn18650929
dc.identifier.urihttps://doi.org/10.1007/978-981-16-1103-2_39
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/30376
dc.publisherSpringer Science and Business Media Deutschland GmbH
dc.subjectSemantic cross embedding
dc.subjectSentence vector
dc.subjectSkip-connection
dc.subjectVideo captioning
dc.titleA Novel Approach for Video Captioning Based on Semantic Cross Embedding and Skip-Connection

Files