Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 10 of 10

Classification of vocal and non-vocal regions from audio songs using spectral features and pitch variations
(Institute of Electrical and Electronics Engineers Inc., 2015) Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, S.G.
In this work, an effort has been made to identify vocal and non-vocal regions from a given song using signal processing techniques and machine learning algorithm. Initially spectral features like mel-frequency cepstral coefficients (MFCCs) are used to develop the baseline system. Statistical values of pitch, jitter and shimmer are considered to improve performance of the system. Artificial neural networks (ANNs) are used to capture the characteristics of vocal and non-vocal segments of the songs. The experiment is conducted on 60 vocal and 60 non-vocal clips extracted from Telugu albums. 11-point moving window is used to ensure the continuity of vocal and non-vocal segments, thus improving the accuracy of system. With this approach system achieves 85.59% accuracy for vocal and 88.52% for non-vocal segment classification. Â© 2015 IEEE.
Audio songs classification based on music patterns
(Springer Verlag service@springer.de, 2016) Sharma, R.; Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, S.G.
In this work, effort has been made to classify audio songs based on their music pattern which helps us to retrieve the music clips based on listenerâ€™s taste. This task is helpful in indexing and accessing the music clip based on listenerâ€™s state. Seven main categories are considered for this work such as devotional, energetic, folk, happy, pleasant, sad and, sleepy. Forty music clips of each category for training phase and fifteen clips of each category for testing phase are considered; vibrato-related features such as jitter and shimmer along with the mel-frequency cepstral coefficients (MFCCs); statistical values of pitch such as min, max, mean, and standard deviation are computed and added to the MFCCs, jitter, and shimmer which results in a 19-dimensional feature vector. feedforward backpropagation neural network (BPNN) is used as a classifier due to its efficiency in mapping the nonlinear relations. The accuracy of 82% is achieved on an average for 105 testing clips. Â© Springer India 2016.
Sound event detection in urban soundscape using two-level classification
(Institute of Electrical and Electronics Engineers Inc., 2016) Luitel, B.; Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, S.G.
A huge increase in automobile field h as lead t o the creation of different sounds in large volume, especially in urban cities. An analysis of the increased quantity of automobiles will give information related to traffic and vehicles. It also provides a scope to understand the scenario of particular location using sound scape information. In this paper, a two level classification is proposed to classify urban sound events such as bus engine (BE), bus horn (BH), car horn (CH) and whistle (W) sounds. The above sounds are taken as they place a major role in traffic scenario. A real-time data is collected from the live recordings at major locations of the urban city. Prior to the detection of events, the class of events is identified u sing signal processing techniques. Further, features such as Mel-frequency cepstral coefficients (MFCCs) a re extracted based on the analysis of a spectrum of the above-mentioned events and they are prominent to classify even in the complex scenario. Classifiers such as artificial neural networks (ANN), naive-Bayesian (NB), decision tree (J48), random forest (RF) are used at two levels. The proposed approach outperforms the existing approaches that usually does direct feature extraction without signal level analysis. Â© 2016 IEEE.
Detection of largest possible repeated patterns in Indian audio songs using spectral features
(Institute of Electrical and Electronics Engineers Inc., 2016) Thomas, M.; Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, S.G.
In the field of Content Based Music Information Retrieval (CB-MIR), researchers are always looking for better ways to classify songs aside from the existing classifiers such as genre, mood, scale, tempo, etc. By determining a way to isolate and extract maximum length repeating patterns (MLRPs) in a music file, we can analyze them in order to describe another potential classifier: complexity. Extraction of repeating patterns would also allow users to easily extract ringtones from their favorite songs. In this paper, an effort has been made to describe a method to extract repeating patterns from a given music file through direct signal level as well as feature level comparison. These extracted patterns can be used as ringtones, or for analysis to determine complexity. Features such as mel-frequency cepstral coefficients (MFCCs), modulation spectral features (MSFs) and jitter are computed to reduce the computational time observed in signal level comparison. Â© 2016 IEEE.
An Accelerated CPU Based Ray Tracer
(Institute of Electrical and Electronics Engineers Inc., 2017) Gudivada, T.; Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, S.G.
Ray Tracing is the essential approach to estimate the global illumination during the process of rendering some real scenarios. The process of ray tracing is computationally intensive since the involvement of real world scenes. Due to this, the quality of rendered images is always inversely proportional to the execution time. The provision of high quality rendered images in the less computational time is the motive for the present article. The control on the light raysâ€™ depth that are released from the eye source to the objects found in the scene is the one possible solution to achieve better speeds. To improvise the quality of images and to reduce the computational time, in this work, the concept of POSIX threads have been used at process level that parallelize the operation. An effort has been made to introduce the novel algorithm that finds the principle intersection among the objects found in the scene and the light rays. As a result, the combinational hybrid model has been designed in order to reduce the computational time and to improvise the quality of rendered images. Moreover, the comparison has been made with the state-of-art approach and the proposed approach outperforms when compared to it. Â© INDIACom-2017.
Performance analysis of LPC and MFCC features in voice conversion using artificial neural networks
(Springer Verlag service@springer.de, 2017) Koolagudi, S.G.; Vishwanath, B.K.; Akshatha, M.; Vishnu Srinivasa Murthy, Y.V.S.
Voice Conversion is a technique in which source speakers voice is morphed to a target speakers voice by learning sourceâ€“target relationship from a number of utterances from source and the target. There are many applications which may benefit from this sort of technology for example dubbing movies, TV-shows, TTS systems and so on. In this paper, analysis on the performance of ANN-based Voice Conversion system is done using linear predictive coding (LPC) and mel-frequency cepstral coefficients (MFCCs). Experimental results show that Voice Conversion system based on LPC features is better than the ones based on MFCC features. Â© Springer Science+Business Media Singapore 2017.
Vocal and Non-vocal Segmentation based on the Analysis of Formant Structure
(Institute of Electrical and Electronics Engineers Inc., 2018) Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, S.G.; Swaroop, V.G.
The process of classifying vocal and non-vocal regions in an audio clip is the base for many Music Information Retrieval (MIR) tasks. In this work, we have computed novel features based on formant structure for segmenting the vocal and non-vocal regions of a given music clip. The features such as obtuse angles at formant peak, valley locations, convexity, and concavity have been proposed for this task after thorough analysis. The obtuse angles have been computed for second, third and fourth formants as much discrimination is not found for the first formant. The computed formant related features have been added to the base-line Mel frequency cepstral coefficients (MFCCs) in order to improve the performance. Moreover, singer formant (F5) has also been computed forming a 19-dimensional feature vector. As artificial neural networks (ANNs) are more suitable for handling nonlinear data, they have been considered as a classifier. Further, the 11-point moving window has been applied to avoid intermittent misclassifications. An accuracy of 88% is obtained using the proposed approach with a 19-dimensional feature vector. Â© 2017 IEEE.
Academic Curriculum Load Balancing using GA
(Institute of Electrical and Electronics Engineers Inc., 2019) Chakradhar, M.; Charan, M.S.; Sai, R.U.; Kunal, M.; Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, G.K.
In the paper, we propose an algorithm using genetic alogithm to find out the optimal solution for the academic load balancing problem. The load balancing problem is to optimize the load of credits per semester in an academic curriculum. In the proposed method, we try to distribute the course load as evenly as possible so that the deviation from the mean credit load per each semester is as minimal as possible. The objective function is to distribute the credit load among all the semesters evenly such that the deviation from the mean credits per semester is minimal. The proposed approach explores the solution space using only mutation operators and does not operate using crossover as the solutions obtained using cross over does not create any newer and better solutions in the solution space.The algorithm is applied on three data sets and the results are compared with the solutions obtained using the existing approaches. The results obtained using the state of the art solution are either better than approaches or on par with the state of art optimal solutions. The solution set obtained using the proposed approach is well spread out through out all the periods and all the periods contain almost mean number of credits. Â© 2019 IEEE.
Objective Assessment of Pitch Accuracy in Equal-Tempered Vocal Music Using Signal Processing Approaches
(Springer, 2020) Biswas, R.; Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, S.G.; Vishnu, S.G.
This paper presentsÂ an approach for assessing the pitch in vocal monophonic music objectively using variousÂ signal processing techniques. A databaseÂ has been collected with 250 recordings containingÂ both arohan and avarohan patterns rendered by 25 different singers for 10 Hindustani classical ragas. The fundamental frequency (F0) values of the user renditions are estimated and analyzed with the original pitch values to quantify the level of variations in pitch initially the five-point moving window has been considered to smoothen the contour. Later, first order and second order differential techniques are applied to estimate the note onset. This process is computationally economical when compared with the available approaches. The technique of cents has been used to evaluate the variation among the target and singing pitch as cent is a unit of the most common tuning system for quantifying intonation in equal tempered music. From this analysis, it is observed that singers with professional training have deviations within 15â€“20 cents, and non-musicians have deviations above 50 cents. Five expert singers rated the global pitch accuracy from the recordings and these results were found to exhibit high correlation with the systemâ€™s assessments. Such an evaluation system with quantitative analysis coupled with visual representation will greatly aid the training process of singers. Â© 2020, Springer Nature Singapore Pte Ltd.
Process Logo: An Approach for Control-Flow Visualization of Information System Process in Process Mining
(Springer Science and Business Media Deutschland GmbH, 2022) Manoj Kumar, M.V.; Bs, B.S.; Sneha, H.R.; Thomas, L.; Annappa, B.; Vishnu Srinivasa Murthy, Y.V.S.
This paper proposes a new technique named â€œProcess Logoâ€ for visualizing the causal relationship between the activities of a process (Control flow). Traditional process mining algorithms rely on representing the activity as a sequence of operations modeled using nodes and edges, as the number of activities increases, the representation of the entire control flow becomes quite tedious. Process logo is a compact yet highly informative method for visually representing the process model. It visually summarizes the number of activities, sequence of execution, relative significance, and dependency between activities. It uses a dynamic programming methodâ€”sequence alignment and clustering approach with Levenshtein measure as a distance measure. The proposed method is evaluated on the synthetic event log, the experimental result is promising. Â© 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results