Video summarization and captioning using dynamic mode decomposition for surveillance
No Thumbnail Available
Date
2021
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Science and Business Media B.V.
Abstract
Video surveillance has become a major tool in security maintenance. But analyzing in a playback version to detect any motion or any sort of movements might be tedious work because only for a short length of the video there would be any motion. There would be a lot of time wasted in analyzing the video and also it is impossible to always find the accurate frame where the transition has occurred. So there is a need in obtaining a summary video that captures any changes/motion. With the advancements in image processing using OpenCV and deep learning, video summarization is no longer an impossible work. Captions are generated for the summarized videos using an encoder–decoder captioning model. With the help of large, well-labeled video data sets like common objects in context, Microsoft video description, video captioning is a feasible task. Encoder–decoder models are used extensively to extract text from visual features with the arrival of long short term memory (LSTM). Attention mechanism has been widely used on decoder for the work of video captioning. Keyframes are obtained from very long videos using methods like dynamic mode decomposition, an algorithm in fluid dynamics, OpenCV’s absdiff(). We propose these tools for motion detection and video/image captioning for very long videos which are common in video surveillance. © 2021, Bharati Vidyapeeth's Institute of Computer Applications and Management.
Description
Keywords
Attention, CNN, DMD, Encoder–decoder, LSTM, MSVD, OpenCV
Citation
International Journal of Information Technology (Singapore), 2021, 13, 5, pp. 1927-1936
