Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
3 results
Search Results
Item Prediction of aesthetic elements in Karnatic music: A machine learning approach(International Speech Communication Association publication@isca-speech.org 4 Rue des Fauvettes - Lous Tourils Baixas 66390, 2018) Rajan, M.; Vijayakumar, A.; Vijayasenan, D.Gamakas, the embellishments and ornamentations used to enhance musical experience, are defining features of Karnatic Music (KM). The appropriateness of using gamaka is determined by aesthetics and is often developed by musicians with experience. Therefore, understanding and modeling gamaka is a significant bottleneck in applications like music synthesis, automatic accompaniment, etc. in the context of KM. To this end, we propose to learn both the presence and the type of gamaka in a data-driven manner using annotated symbolic music. In particular, we explore the efficacy of three classes of features - note-based, phonetic and structural - and train a Random Forest Classifier to predict the existence and the type of gamaka. The observed accuracy is ∼70% for gamaka detection and ∼60% for gamaka classification. Finally, we present an analysis of the features and find that frequency and duration of the neighbouring notes prove to be the most important features. © 2018 International Speech Communication Association. All rights reserved.Item Singing Voice Synthesis System for Carnatic Music(Institute of Electrical and Electronics Engineers Inc., 2018) Rajan, M.Singing Voice Synthesis systems take speech, lyric and note information as inputs, and produce songs as the output. For converting speech to song, the duration and pitch of the speech need to be modified to match the desired pitch and duration of the song. In this paper, we propose a baseline speech to singing voice synthesis system for Carnatic music. We synthesize two popular Carnatic songs from the flat pitched recordings of a vowel sound. Pitch of the input sound is modified according to the frequencies of notes present in the original songs. To avoid abrupt pitch changes, transitions between adjacent notes are smoothed using a sinusoid-based function. To add naturalness to the synthesized song, fluctuations present in input speech are retained. Harmonic plus Noise Model is used to synthesize the songs. Subjective evaluation is performed by ten listeners, and the Mean Opinion Scores for the songs are found to be 3.1 and 3. © 2018 IEEE.Item Nisp: A multi-lingual multi-accent dataset for speaker profiling(Institute of Electrical and Electronics Engineers Inc., 2021) Kalluri, S.B.; Vijayasenan, D.; Ganapathy, S.; Rajan, M.; Krishnan, P.Many commercial and forensic applications of speech demand the extraction of information about the speaker characteristics, which falls into the broad category of speaker profiling. The speaker characteristics needed for profiling include physical traits of the speaker like height, age, and gender of the speaker along with the native language of the speaker. Many of the datasets available have only partial information for speaker profiling. In this paper, we attempt to overcome this limitation by developing a new dataset which has speech data from five different Indian languages along with English. The metadata information for speaker profiling applications like linguistic information, regional information, and physical characteristics of a speaker are also collected. We call this dataset as NITK-IISc Multilingual Multi-accent Speaker Profiling (NISP) dataset. The description of the dataset, potential applications, and baseline results for speaker profiling on this dataset are provided in this paper. © 2021 IEEE.
