Roopalakshmi, R.Guddeti, G.R.M.2026-02-052015Signal, Image and Video Processing, 2015, 9, 1, pp. 201-21018631703https://doi.org/10.1007/s11760-013-0424-7https://idr.nitk.ac.in/handle/123456789/26344Spatio-temporal alignments and estimation of distortion model between pirate and master video contents are prerequisites, in order to approximate the illegal capture location in a theater. State-of-the-art techniques are exploiting only visual features of videos for the alignment and distortion model estimation of watermarked sequences, while few efforts are made toward acoustic features and non-watermarked video contents. To solve this, we propose a distortion model estimation framework based on multimodal signatures, which fully integrates several components: Compact representation of a video using visual-audio fingerprints derived from Speeded Up Robust Features and Mel-Frequency Cepstral Coefficients; Segmentation-based bipartite matching scheme to obtain accurate temporal alignments; Stable frame pairs extraction followed by filtering policies to achieve geometric alignments; and distortion model estimation in terms of homographic matrix. Experiments on camcorded datasets demonstrate the promising results of the proposed framework compared to the reference methods. © 2013, Springer-Verlag London.AlignmentGeometrySpeech recognitionVideo recordingDLTDuplicate videoFrame alignmentsGeometric distortionMFCCSURFFrequency estimationA framework for estimating geometric distortions in video copies based on visual-audio fingerprints