Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 3 of 3
  • Item
    A novel approach to video copy detection using audio fingerprints and PCA
    (Elsevier B.V., 2011) Roopalakshmi, R.; Guddeti, G.R.M.
    In Content-Based Copy detection (CBCD) literature, numerous state-of-the-art techniques are primarily focusing on visual content of video. Exploiting audio fingerprints for CBCD problem is necessary, because of following rea-sons: audio content constitutes an indispensable information source; transformations on audio content is limited compared to visual content. In this paper, a novel CBCD approach using audio features and PCA is proposed, which includes two stages: first, multiple feature vectors are computed by utilizing MFCC and four spectral descriptors; second, features are further processed using PCA, to provide compact feature description. The results of experiments tested on TRECVID-2007 dataset, demonstrate the efficiency of proposed method against various transformations. © 2011 Published by Elsevier Ltd.
  • Item
    Multiclass SVM-based language-independent emotion recognition using selective speech features
    (Institute of Electrical and Electronics Engineers Inc., 2014) Kokane Amol, T.; Guddeti, G.R.M.
    In this paper, we emphasize on recognizing six basic emotions viz. Anger, Disgust, Fear, Happiness, Neutral and Sadness using selective features of speech signal of different languages like Germen and Telugu. The feature set includes thirteen Mel-Frequency Cepstral Coefficients (MFCC) and four other features of speech signal such as Energy, Short Term Energy, Spectral Roll-Off and Zero-Crossing Rate (ZCR). The Surrey Audio-Visual Expressed Emotion (SAVEE) Database is used to train the Multiclass Support Vector Machine (SVM) classifier and a German Corpus EMO-DB (Berlin Database of Emotional Speech) and Telugu Corpus IITKGP: SESC are used for emotion recognition. The results are analyzed for each speech emotion separately and obtained accuracies of 98.3071% and 95.8166 % for Emo-DB, IITKGP: SESC databases respectively. © 2014 IEEE.
  • Item
    A framework for estimating geometric distortions in video copies based on visual-audio fingerprints
    (Springer-Verlag London Ltd, 2015) Roopalakshmi, R.; Guddeti, G.R.M.
    Spatio-temporal alignments and estimation of distortion model between pirate and master video contents are prerequisites, in order to approximate the illegal capture location in a theater. State-of-the-art techniques are exploiting only visual features of videos for the alignment and distortion model estimation of watermarked sequences, while few efforts are made toward acoustic features and non-watermarked video contents. To solve this, we propose a distortion model estimation framework based on multimodal signatures, which fully integrates several components: Compact representation of a video using visual-audio fingerprints derived from Speeded Up Robust Features and Mel-Frequency Cepstral Coefficients; Segmentation-based bipartite matching scheme to obtain accurate temporal alignments; Stable frame pairs extraction followed by filtering policies to achieve geometric alignments; and distortion model estimation in terms of homographic matrix. Experiments on camcorded datasets demonstrate the promising results of the proposed framework compared to the reference methods. © 2013, Springer-Verlag London.