A Novel Approach for Video Captioning Based on Semantic Cross Embedding and Skip-Connection

Radarapu, R.; Bandari, N.; Muthyam, S.; Naik, D.

A Novel Approach for Video Captioning Based on Semantic Cross Embedding and Skip-Connection

dc.contributor.author	Radarapu, R.
dc.contributor.author	Bandari, N.
dc.contributor.author	Muthyam, S.
dc.contributor.author	Naik, D.
dc.date.accessioned	2026-02-06T06:36:17Z
dc.date.issued	2021
dc.description.abstract	Video Captioning is the task of describing the content of a video in simple natural language. Encoder-Decoder architecture is the most widely used architecture for this task. Recent works exploit the use of 3D Convolutional Neural Networks (CNNs), Transformers or by changing the structure of basic Long Short-Term Memory (LSTM) units used in Encoder-Decoder to improve the performance. In this paper, we propose the use of a sentence vector to improve the performance of the Encoder-Decoder model. This sentence vector acts as an intermediary between the video space and the text space. Thus, it is referred to as semantic cross embedding that bridges the two vector spaces, in this paper. The sentence vector is generated from the video and is used by the Decoder, along with previously generated words to generate a suitable description. We also employ the use of a skip-connection in the Encoder part of the model. Skip-connection is usually employed to tackle the vanishing gradients problem in deep neural networks. However, our experiments show that a two-layer LSTM with a skip-connection performs better than the Bidirectional LSTM, for our model. Also, the use of a sentence vector improves performance considerably. All our experiments are performed on the MSVD dataset. Â© 2021, Springer Nature Singapore Pte Ltd.
dc.identifier.citation	Communications in Computer and Information Science, 2021, Vol.1378 CCIS, , p. 465-477
dc.identifier.issn	18650929
dc.identifier.uri	https://doi.org/10.1007/978-981-16-1103-2_39
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/30376
dc.publisher	Springer Science and Business Media Deutschland GmbH
dc.subject	Semantic cross embedding
dc.subject	Sentence vector
dc.subject	Skip-connection
dc.subject	Video captioning
dc.title	A Novel Approach for Video Captioning Based on Semantic Cross Embedding and Skip-Connection

Collections

Conference Papers

A Novel Approach for Video Captioning Based on Semantic Cross Embedding and Skip-Connection

Files

Collections