1. Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/1/5

Browse

Search Results

Now showing 1 - 10 of 89
  • Item
    Content-based music information retrieval (CB-MIR) and its applications toward the music industry: A review
    (2018) Srinivasa Murthy Y.V.; Koolagudi, S.G.
    A huge increase in the number of digital music tracks has created the necessity to develop an automated tool to extract the useful information from these tracks. As this information has to be extracted from the contents of the music, it is known as content-based music information retrieval (CB-MIR). In the past two decades, several research outcomes have been observed in the area of CB-MIR. There is a need to consolidate and critically analyze these research findings to evolve future research directions. In this survey article, various tasks of CB-MIR and their applications are critically reviewed. In particular, the article focuses on eight MIR-related tasks such as vocal/non-vocal segmentation, artist identification, genre classification, raga identification, query-by-humming, emotion recognition, instrument recognition, and music clip annotation. The fundamental concepts of Indian classical music are detailed to attract future research on this topic. The article elaborates on the signal-processing techniques to extract useful features for performing specific tasks mentioned above and discusses their strengths as well as weaknesses. This article also points to some general research issues in CB-MIR and probable approaches toward their solutions so as to improve the efficiency of the existing CB-MIR systems. 2018 Copyright is held by the owner/author(s). © 2018 Association for Computing Machinery. All rights reserved.
  • Item
    Segmentation and characterization of acoustic event spectrograms using singular value decomposition
    (2019) Mulimani, M.; Koolagudi, S.G.
    The traditional frame-based speech features such as Mel-frequency cepstral coefficients (MFCCs) are specifically developed for speech/speaker recognition tasks. Speech is different from acoustic events, when one considers its phonetic structure. Hence, frame-based speech features may not be suitable for Acoustic Event Classification (AEC). In this paper, a novel method is proposed for the extraction of robust acoustic event specific features from the spectrogram using a left singular vector for AEC. It consists of two main stages: segmentation and characterization of acoustic event spectrograms. In the first stage, symmetric Laplacian matrix of an acoustic event spectrogram is decomposed into singular values and vectors. Then, reliable region (spectral shape) of an acoustic from the spectrogram is segmented using a left singular vector. The selected prominent values of a left singular vector using the proposed threshold, automatically segment the reliable region of an acoustic event from the spectrogram. In the second stage, the segmented region of the spectrogram is used as a feature vector for AEC. Characteristics of values of singular vector belonging to reliable (event) and unreliable (non-event) regions of the spectrogram are determined. To evaluate the proposed approach, different categories of home acoustic events are considered from the Freiburg-106 dataset. The results show that the significantly improved performance of acoustic event segmentation and classification. A singular vector effectively segments the reliable region of the acoustic event from spectrogram for Support Vector Machine (SVM) based AEC system. The proposed AEC system is robust to noise and achieves higher recognition rate in clean and noisy conditions compared to the traditional speech feature based systems. 2018 Elsevier Ltd
  • Item
    Scalable and fair forwarding of elephant and mice traffic in software defined networks
    (2015) Hegde, S.; Koolagudi, S.G.; Bhattacharya, S.
    A software defined network decouples the control and data planes of the networking devices and places the control plane of all the switches in a central server. These flow based networks do not scale well because of the increased number of switch to controller communications, limited size of flow tables and increased size of flow table entries in the switches. In our work we use labels to convey control information of path and policy in the packet. This makes the core of the network simple and all routing and policy decisions are taken at the edge. The routing algorithm splits the elephant traffic into mice and distributes them across multiple paths, thus ensuring latency sensitive mice traffic is not adversely affected by elephant traffic. We observed that label based forwarding and traffic splitting work well together to enable scalable and fair forwarding. Our approach is topology independent. We present here a few preliminary simulation results obtained by running our routing algorithm on random network topologies. 2015 Elsevier B.V.
  • Item
    Robust Acoustic Event Classification using Fusion Fisher Vector features
    (2019) Mulimani, M.; Koolagudi, S.G.
    In this paper, a novel Fusion Fisher Vector (FFV) features are proposed for Acoustic Event Classification (AEC) in the meeting room environments. The monochrome images of a pseudo-color spectrogram of an acoustic event are represented as Fisher vectors. First, irrelevant feature dimensions of each Fisher vector are discarded using Principal Component Analysis (PCA) and then, resulting Fisher vectors are fused to get FFV features. Performance of the FFV features is evaluated on acoustic events of UPC-TALP dataset in clean and different noisy conditions. Results show that proposed FFV features are robust to noise and achieve overall 94.32% recognition accuracy in clean and different noisy conditions. 2019 Elsevier Ltd
  • Item
    Recognition of emotions from video using acoustic and facial features
    (2015) Rao, K.S.; Koolagudi, S.G.
    In this paper, acoustic and facial features extracted from video are explored for recognizing emotions. The temporal variation of gray values of the pixels within eye and mouth regions is used as a feature to capture the emotion-specific knowledge from the facial expressions. Acoustic features representing spectral and prosodic information are explored for recognizing emotions from the speech signal. Autoassociative neural network models are used to capture the emotion-specific information from acoustic and facial features. The basic objective of this work is to examine the capability of the proposed acoustic and facial features in view of capturing the emotion-specific information. Further, the correlations among the feature sets are analyzed by combining the evidences at different levels. The performance of the emotion recognition system developed using acoustic and facial features is observed to be 85.71 and 88.14 %, respectively. It has been observed that combining the evidences of models developed using acoustic and facial features improved the recognition performance to 93.62 %. The performance of the emotion recognition systems developed using neural network models is compared with hidden Markov models, Gaussian mixture models and support vector machine models. The proposed features and models are evaluated on real-life emotional database, Interactive Emotional Dyadic Motion Capture database, which was recently collected at University of Southern California. 2013, Springer-Verlag London.
  • Item
    Raga and Tonic Identification in Carnatic Music
    (2017) Samsekai, Manjabhat, S.; Koolagudi, S.G.; Rao, K.S.; Ramteke, P.B.
    Raga and tonic are the basic elements based on which melody is constructed in Carnatic music. Raga is the framework for building melody where as tonic frequency establishes the base and a swara is identified ( R or G etc.) based on that base frequency. In this work, an effort has been made to identify raga and tonic of a given piece of Carnatic music. The proposed method is divided into two phases. In the first phase, tonic and raga have been determined independently using the features extracted from pitch histogram. In the second phase, raga and tonic are updated iteratively using the derived note information. In this work, raga will be recognised based on the features extracted from probability density function (pdf) of pitch values extracted from the music clip. The raga identification is performed using different classifiers such as feedforward neural network model, Gaussian Mixture Models and decision trees. A mathematical model based on the parameters of pitch pdf is proposed for tonic identification. Proposed raga and tonic identification system is evaluated on two datasets: 213 music clips from 14 ragas and CompMusic data-set (538 clips from 17 ragas). For first data-set, the average accuracy of raga and tonic identification is found to be 90.14 and 94.83%, respectively. With CompMusic data-set, an average accuracy of 95% is achieved for raga identification. 2017 Informa UK Limited, trading as Taylor & Francis Group.
  • Item
    Prediction model for peninsular Indian summer monsoon rainfall using data mining and statistical approaches
    (2017) Vathsala, H.; Koolagudi, S.G.
    In this paper we discuss a data mining application for predicting peninsular Indian summer monsoon rainfall, and propose an algorithm that combine data mining and statistical techniques. We select likely predictors based on association rules that have the highest confidence levels. We then cluster the selected predictors to reduce their dimensions and use cluster membership values for classification. We derive the predictors from local conditions in southern India, including mean sea level pressure, wind speed, and maximum and minimum temperatures. The global condition variables include southern oscillation and Indian Ocean dipole conditions. The algorithm predicts rainfall in five categories: Flood, Excess, Normal, Deficit and Drought. We use closed itemset mining, cluster membership calculations and a multilayer perceptron function in the algorithm to predict monsoon rainfall in peninsular India. Using Indian Institute of Tropical Meteorology data, we found the prediction accuracy of our proposed approach to be exceptionally good. 2016 Elsevier Ltd
  • Item
    Phoneme boundary detection from speech: A rule based approach
    (2019) Ramteke, P.B.; Koolagudi, S.G.
    In this paper, a novel approach has been proposed for the automatic segmentation of speech signal into phonemes. In a well spoken word, phonemes can be characterized by the changes observed in speech waveform. To get phoneme boundaries, the signal level properties of speech waveform i.e. changes in the waveform during transformation from one phoneme to the other are explored. The problem of phoneme level segmentation has been addressed in this work from two aspects 1. Segmentation of phonemes between voiced and unvoiced portions and 2. Segmentation of phonemes within voiced and unvoiced regions. Pitch and zero-frequency filter signal are used to get the region of change from voiced to unvoiced and vice versa. The segmentation of phoneme boundaries within voiced and unvoiced regions are approximated using the properties of power spectrum of correlation of adjacent frames of the signal. A finite set of rules is proposed on the variations observed in the power spectrum during phoneme transitions. The segmentation results of both approaches are combined to get the final phoneme boundaries. Three databases namely TIMIT Corpus, IIIT Hyderabad Marathi database & IIIT Hyderabad Hindi database (IIIT-H Indic Speech Databases) are used to test the proposed approach; an accuracy of 95.40%, 96.87% and 96.12% is achieved within the tolerance range of 10 ms respectively. The results of the proposed approach are observed to give precise phoneme boundaries. 2019 Elsevier B.V.
  • Item
    Music cryptography based on carnatic music
    (2019) Rao, D.; Koolagudi, S.G.
    Music and cryptography have been linked to one another since ancient times. The idea of replacing plaintext letters with music notes and sending the music file to receiver, is not new. But such replacements sometimes result in music clips which are not pleasant to listeners and thereby leading to the music clip gaining unnecessary extra attention. Most of the works done in this area, fail to ensure the generation of a music clip that invariably conforms to any particular form of music. Melody of the music clip is neglected. In order to address this issue, current paper proposes a novel approach for sharing a secret message based on concepts of Carnatic Classical Music. The method proposed here aims at converting a message in textual format to a music clip before sending it to the receiver. Receiver can then decrypt that message using the knowledge of range of frequency values associated with each musical note also called as 'swara' in Carnatic Classical Music. Each plaintext character from English alphabet is replaced by different combinations of swaras. The set of swaras mapped to each plaintext character is so chosen that the final music file produced as the output of encryption always conforms to a melodic form ('Raga') governed by the framework of Carnatic Classical Music. Ten subject matter experts in the field of Carnatic music have given their opinion about the conformance of these music clips to specified ragas. Also, Mean Opinion Score (MOS) of 25 listeners has been tabulated to test and verify the melodic aspect of these music clips. BEIESP.
  • Item
    Long-range prediction of Indian summer monsoon rainfall using data mining and statistical approaches
    (2017) Vathsala, H.; Koolagudi, S.G.
    This paper presents a hybrid model to better predict Indian summer monsoon rainfall. The algorithm considers suitable techniques for processing dense datasets. The proposed three-step algorithm comprises closed itemset generation-based association rule mining for feature selection, cluster membership for dimensionality reduction, and simple logistic function for prediction. The application of predicting rainfall into flood, excess, normal, deficit, and drought based on 36 predictors consisting of land and ocean variables is presented. Results show good accuracy in the considered study period of 37years (1969 2005). 2016, Springer-Verlag Wien.