Journal Articles

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/19884

Browse

Search Results

Now showing 1 - 10 of 45

Content-based music information retrieval (CB-MIR) and its applications toward the music industry: A review
(Association for Computing Machinery, 2019) Vishnu Srinivasa Murthy, Y.V.; Koolagudi, S.G.
A huge increase in the number of digital music tracks has created the necessity to develop an automated tool to extract the useful information from these tracks. As this information has to be extracted from the contents of the music, it is known as content-based music information retrieval (CB-MIR). In the past two decades, several research outcomes have been observed in the area of CB-MIR. There is a need to consolidate and critically analyze these research findings to evolve future research directions. In this survey article, various tasks of CB-MIR and their applications are critically reviewed. In particular, the article focuses on eight MIR-related tasks such as vocal/non-vocal segmentation, artist identification, genre classification, raga identification, query-by-humming, emotion recognition, instrument recognition, and music clip annotation. The fundamental concepts of Indian classical music are detailed to attract future research on this topic. The article elaborates on the signal-processing techniques to extract useful features for performing specific tasks mentioned above and discusses their strengths as well as weaknesses. This article also points to some general research issues in CB-MIR and probable approaches toward their solutions so as to improve the efficiency of the existing CB-MIR systems. 2018 Copyright is held by the owner/author(s). © 2018 Association for Computing Machinery. All rights reserved.
Mineral classification on Martian surface using CRISM hyperspectral data: a survey
(SPIE, 2023) Kumari, P.; Soor, S.; Shetty, A.; Koolagudi, S.G.
The compact Reconnaissance Imaging Spectrometer for Mars (CRISM) has significantly advanced our understanding of the mineralogy of Mars. With its enhanced spectral and spatial resolution, CRISM has enabled the identification and characterization of various minerals on the Martian surface, providing valuable insights into Mars’ past climate and geologic history, as well as the evolution of the planet’s atmosphere and climate. We present a comprehensive review of mineral identification on Mars using CRISM data. We discuss the data description, pre-processing techniques, different spectrum libraries, geological characteristics used for mineral identification, challenges, and methodologies used for mineral classification, such as learning models, probabilistic methods, and neural networks. We highlight major findings of minerals on the Martian surface and discuss validation techniques. We conclude with a discussion of further research to address the existing gaps and challenges in this field. Overall, we provide a general understanding of mineral classification using CRISM data and could serve as a helpful resource for researchers and scientists interested in planetary remote sensing and mineral identification on the Martian surface. © 2023 Society of Photo-Optical Instrumentation Engineers (SPIE)
Acoustic Event and Scene Classification: A Review
(Springer, 2025) Mulimani, M.; Venkatesh, S.; Koolagudi, S.G.
This paper gives deeper insight into the range of recent approaches developed and reported in the literature specifically for monophonic acoustic event classification (AEC), polyphonic acoustic event detection (AED) and acoustic scene classification (ASC) concerning datasets, features and classifiers. A list of datasets used for monophonic AEC, polyphonic AED and ASC is introduced. The features and classifiers used for monophonic AEC, polyphonic AED and ASC are reviewed with their success and failures. A list of the research issues is derived from the critical review of the available literature at the end of the paper. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.
Film segmentation and indexing using autoassociative neural networks
(2014) Sreenivasa Rao, K.S.; Nandi, D.; Koolagudi, S.G.
In this paper, Autoassociative Neural Network (AANN) models are explored for segmentation and indexing the films (movies) using audio features. A two-stage method is proposed for segmenting the film into sequence of scenes, and then indexing them appropriately. In the first stage, music and speech plus music segments of the film are separated, and music segments are labelled as title and fighting scenes based on their position. At the second stage, speech plus music segments are classified into normal, emotional, comedy and song scenes. In this work, Mel frequency cepstral coefficients (MFCCs), zero crossing rate and intensity are used as audio features for segmentation and indexing the films. The proposed segmentation and indexing method is evaluated on manual segmented Hindi films. From the evaluation results, it is observed that title, fighting and song scenes are segmented and indexed without any errors, and most of the errors are observed in discriminating the comedy and normal scenes. Performance of the proposed AANN models used for segmentation and indexing of the films, is also compared with hidden Markov models, Gaussian mixture models and support vector machines. © 2013 Springer Science+Business Media New York.
Bird classification based on their sound patterns
(Springer New York LLC barbara.b.bertram@gsk.com, 2016) Raghuram, M.A.; Chavan, N.R.; Belur, R.; Koolagudi, S.G.
In this paper we focus on automatic bird classification based on their sound patterns. This is useful in the field of ornithology for studying bird species and their behavior based on their sound. The proposed methodology may be used to conduct survey of birds. The proposed methods may be used to automatically classify birds using different audio processing and machine learning techniques on the basis of their chirping patterns. An effort has been made in this work to map characteristics of birds such as size, habitat, species and types of call, on to their sounds. This study is also part of a broader project that includes development of software and hardware systems to monitor the bird species that appear in different geographical locations which helps ornithologists to monitor environmental conditions with respect to specific bird species. © 2016, Springer Science+Business Media New York.
Prediction model for peninsular Indian summer monsoon rainfall using data mining and statistical approaches
(Elsevier Ltd, 2017) Vathsala, H.; Koolagudi, S.G.
In this paper we discuss a data mining application for predicting peninsular Indian summer monsoon rainfall, and propose an algorithm that combine data mining and statistical techniques. We select likely predictors based on association rules that have the highest confidence levels. We then cluster the selected predictors to reduce their dimensions and use cluster membership values for classification. We derive the predictors from local conditions in southern India, including mean sea level pressure, wind speed, and maximum and minimum temperatures. The global condition variables include southern oscillation and Indian Ocean dipole conditions. The algorithm predicts rainfall in five categories: Flood, Excess, Normal, Deficit and Drought. We use closed itemset mining, cluster membership calculations and a multilayer perceptron function in the algorithm to predict monsoon rainfall in peninsular India. Using Indian Institute of Tropical Meteorology data, we found the prediction accuracy of our proposed approach to be exceptionally good. © 2016 Elsevier Ltd
Hierarchical secret sharing scheme using parts of speech of English grammar
(Inderscience Publishers, 2017) Chatterjee, S.; Koolagudi, S.G.
In this paper, a model to share secret information in conjunctive and disjunctive hierarchical access structure using obfuscation is proposed. Indistinguishability Obfuscation is achieved with a context free grammar (CFG) as a mimic function. Obfuscation is used to maintain confidentiality of the message in the presence of a dishonest distributor who is curious to know the secret. A new way to effectively reduce the size of the share is also achieved in this model. First, mimic function is used to convert the statistical profile of the message to a random distribution of words from a chosen paragraph. The frequency distribution of different parts of speech (PoS) components of the obfuscated string is used to build a model for distribution of shares to n people based on the responsibility of the person in a hierarchy. Sharing the information and reconstruction of the original message is also shown. It is also shown that the obfuscation is secure against chosen plain text attack. © © 2017 Inderscience Enterprises Ltd.
Raga and Tonic Identification in Carnatic Music
(Taylor and Francis Ltd. michael.wagreich@univie.ac.at, 2017) Samsekai Manjabhat, S.; Koolagudi, S.G.; Sreenivasa Rao, K.S.; Ramteke, P.B.
Raga and tonic are the basic elements based on which melody is constructed in Carnatic music. Raga is the framework for building melody where as tonic frequency establishes the base and a swara is identified (‘R’ or ‘G’ etc.) based on that base frequency. In this work, an effort has been made to identify raga and tonic of a given piece of Carnatic music. The proposed method is divided into two phases. In the first phase, tonic and raga have been determined independently using the features extracted from pitch histogram. In the second phase, raga and tonic are updated iteratively using the derived note information. In this work, raga will be recognised based on the features extracted from probability density function (pdf) of pitch values extracted from the music clip. The raga identification is performed using different classifiers such as feedforward neural network model, Gaussian Mixture Models and decision trees. A mathematical model based on the parameters of pitch pdf is proposed for tonic identification. Proposed raga and tonic identification system is evaluated on two datasets: 213 music clips from 14 ragas and CompMusic data-set (538 clips from 17 ragas). For first data-set, the average accuracy of raga and tonic identification is found to be 90.14 and 94.83%, respectively. With CompMusic data-set, an average accuracy of 95% is achieved for raga identification. © 2017 Informa UK Limited, trading as Taylor & Francis Group.
Long-range prediction of Indian summer monsoon rainfall using data mining and statistical approaches
(Springer-Verlag Wien michaela.bolli@springer.at, 2017) Vathsala, H.; Koolagudi, S.G.
This paper presents a hybrid model to better predict Indian summer monsoon rainfall. The algorithm considers suitable techniques for processing dense datasets. The proposed three-step algorithm comprises closed itemset generation-based association rule mining for feature selection, cluster membership for dimensionality reduction, and simple logistic function for prediction. The application of predicting rainfall into flood, excess, normal, deficit, and drought based on 36 predictors consisting of land and ocean variables is presented. Results show good accuracy in the considered study period of 37years (1969–2005). © 2016, Springer-Verlag Wien.
Dravidian language classification from speech signal using spectral and prosodic features
(Springer New York LLC barbara.b.bertram@gsk.com, 2017) Koolagudi, S.G.; Bharadwaj, A.; Vishnu Srinivasa Murthy, Y.V.; Reddy, N.; Rao, P.
The interesting aspect of the Dravidian languages is a commonality through a shared script, similar vocabulary, and their common root language. In this work, an attempt has been made to classify the four complex Dravidian languages using cepstral coefficients and prosodic features. The speech of Dravidian languages has been recorded in various environments and considered as a database. It is demonstrated that while cepstral coefficients can indeed identify the language correctly with a fair degree of accuracy, prosodic features are added to the cepstral coefficients to improve language identification performance. Legendre polynomial fitting and the principle component analysis (PCA) are applied on feature vectors to reduce dimensionality which further resolves the issue of time complexity. In the experiments conducted, it is found that using both cepstral coefficients and prosodic features, a language identification rate of around 87% is obtained, which is about 18% above the baseline system using Mel-frequency cepstral coefficients (MFCCs). It is observed from the results that the temporal variations and prosody are the important factors needed to be considered for the tasks of language identification. © 2017, Springer Science+Business Media, LLC.

Journal Articles

Browse

Filters

Settings

Sort By

Results per page

Search Results