Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
2 results
Search Results
Item Singer Identification from Smaller Snippets of Audio Clips Using Acoustic Features and DNNs(Institute of Electrical and Electronics Engineers Inc., 2018) S Murthy, Y.V.; Jeshventh Raja, T.K.R.; Zoeb, M.; Saumyadip, M.; Koolagudi, S.G.Singer identification (SID) is one of the crucial tasks of music information retrieval (MIR). The presence of background accompaniment makes the task little complicated. The performance of SID with the combination of the cepstral and chromagram features has been analyzed in this work. Mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstral features (LPCCs) have been computed as cepstral features and added to 12-dimensional chroma vector which is obtained from chromagram. Two different datasets have been used for experimentation, of which one is standard artist-20 and the other one is Indian singers database, which is proposed by us, with 20 Indian singers. Two different classifiers, namely random forest (RF) and deep neural networks (DNNs) are considered based on their performance in estimating the singers. The proposed approach is found to be efficient even if the input clip is of length five seconds. © 2018 IEEE.Item Recognition of emotions from video using acoustic and facial features(Springer-Verlag London Ltd, 2015) Sreenivasa Rao, K.S.; Koolagudi, S.In this paper, acoustic and facial features extracted from video are explored for recognizing emotions. The temporal variation of gray values of the pixels within eye and mouth regions is used as a feature to capture the emotion-specific knowledge from the facial expressions. Acoustic features representing spectral and prosodic information are explored for recognizing emotions from the speech signal. Autoassociative neural network models are used to capture the emotion-specific information from acoustic and facial features. The basic objective of this work is to examine the capability of the proposed acoustic and facial features in view of capturing the emotion-specific information. Further, the correlations among the feature sets are analyzed by combining the evidences at different levels. The performance of the emotion recognition system developed using acoustic and facial features is observed to be 85.71 and 88.14 %, respectively. It has been observed that combining the evidences of models developed using acoustic and facial features improved the recognition performance to 93.62 %. The performance of the emotion recognition systems developed using neural network models is compared with hidden Markov models, Gaussian mixture models and support vector machine models. The proposed features and models are evaluated on real-life emotional database, Interactive Emotional Dyadic Motion Capture database, which was recently collected at University of Southern California. © 2013, Springer-Verlag London.
