A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM

Naik, D.; Jaidhar, C.D.

A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM

dc.contributor.author	Naik, D.
dc.contributor.author	Jaidhar, C.D.
dc.date.accessioned	2026-02-04T12:27:23Z
dc.date.issued	2022
dc.description.abstract	The massive influx of text, images, and videos to the internet has recently increased the challenge of computer vision-based tasks in big data. Integrating visual data with natural language to generate video explanations has been a challenge for decades. However, recent experiments on image/video captioning that employ Long-Short-Term-Memory (LSTM) have piqued the interest of researchers studying its possible application in video captioning. The proposed video captioning architecture combines the bidirectional multilayer LSTM (BiLSTM) encoder and unidirectional decoder. The innovative architecture also considers temporal relations when creating superior global video representations. In contrast to the majority of prior work, the most relevant features of a video are selected and utilized specifically for captioning purposes. Existing methods utilize a single-layer attention mechanism for linking visual input with phrase meaning. This approach employs LSTMs and a multilayer attention mechanism to extract characteristics from movies, construct links between multi-modal (words and visual material) representations, and generate sentences with rich semantic coherence. In addition, we evaluated the performance of the suggested system using a benchmark dataset for video captioning. The obtained results reveal superior performance relative to state-of-the-art works in METEOR and promising performance relative to the BLEU score. In terms of quantitative performance, the proposed approach outperforms most existing methodologies. © 2022, The Author(s).
dc.identifier.citation	Journal of Big Data, 2022, 9, 1, pp. -
dc.identifier.uri	https://doi.org/10.1186/s40537-022-00664-6
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/22271
dc.publisher	Springer Science and Business Media Deutschland GmbH
dc.subject	Benchmarking
dc.subject	Convolutional neural networks
dc.subject	Long short-term memory
dc.subject	Multilayer neural networks
dc.subject	Multilayers
dc.subject	Network architecture
dc.subject	Semantics
dc.subject	Visual languages
dc.subject	Attention
dc.subject	Attention mechanisms
dc.subject	Convolutional neural network
dc.subject	Multi-layers
dc.subject	Natural languages
dc.subject	Performance
dc.subject	Text images
dc.subject	Video captioning
dc.subject	Vision based
dc.subject	Visual data
dc.subject	Computer vision
dc.title	A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM

Collections

Journal Articles

A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM

Files

Collections