Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
5 results
Search Results
Item Prediction of aesthetic elements in Karnatic music: A machine learning approach(International Speech Communication Association publication@isca-speech.org 4 Rue des Fauvettes - Lous Tourils Baixas 66390, 2018) Rajan, M.; Vijayakumar, A.; Vijayasenan, D.Gamakas, the embellishments and ornamentations used to enhance musical experience, are defining features of Karnatic Music (KM). The appropriateness of using gamaka is determined by aesthetics and is often developed by musicians with experience. Therefore, understanding and modeling gamaka is a significant bottleneck in applications like music synthesis, automatic accompaniment, etc. in the context of KM. To this end, we propose to learn both the presence and the type of gamaka in a data-driven manner using annotated symbolic music. In particular, we explore the efficacy of three classes of features - note-based, phonetic and structural - and train a Random Forest Classifier to predict the existence and the type of gamaka. The observed accuracy is ∼70% for gamaka detection and ∼60% for gamaka classification. Finally, we present an analysis of the features and find that frequency and duration of the neighbouring notes prove to be the most important features. © 2018 International Speech Communication Association. All rights reserved.Item Singing Voice Synthesis System for Carnatic Music(Institute of Electrical and Electronics Engineers Inc., 2018) Rajan, M.Singing Voice Synthesis systems take speech, lyric and note information as inputs, and produce songs as the output. For converting speech to song, the duration and pitch of the speech need to be modified to match the desired pitch and duration of the song. In this paper, we propose a baseline speech to singing voice synthesis system for Carnatic music. We synthesize two popular Carnatic songs from the flat pitched recordings of a vowel sound. Pitch of the input sound is modified according to the frequencies of notes present in the original songs. To avoid abrupt pitch changes, transitions between adjacent notes are smoothed using a sinusoid-based function. To add naturalness to the synthesized song, fluctuations present in input speech are retained. Harmonic plus Noise Model is used to synthesize the songs. Subjective evaluation is performed by ten listeners, and the Mean Opinion Scores for the songs are found to be 3.1 and 3. © 2018 IEEE.Item Nisp: A multi-lingual multi-accent dataset for speaker profiling(Institute of Electrical and Electronics Engineers Inc., 2021) Kalluri, S.B.; Vijayasenan, D.; Ganapathy, S.; Rajan, M.; Krishnan, P.Many commercial and forensic applications of speech demand the extraction of information about the speaker characteristics, which falls into the broad category of speaker profiling. The speaker characteristics needed for profiling include physical traits of the speaker like height, age, and gender of the speaker along with the native language of the speaker. Many of the datasets available have only partial information for speaker profiling. In this paper, we attempt to overcome this limitation by developing a new dataset which has speech data from five different Indian languages along with English. The metadata information for speaker profiling applications like linguistic information, regional information, and physical characteristics of a speaker are also collected. We call this dataset as NITK-IISc Multilingual Multi-accent Speaker Profiling (NISP) dataset. The description of the dataset, potential applications, and baseline results for speaker profiling on this dataset are provided in this paper. © 2021 IEEE.Item Dehazing of Satellite Images using Adaptive Black Widow Optimization-based framework(Taylor and Francis Ltd., 2021) Suresh, S.; Rajan, M.; Pushparaj, J.; Cs, A.; Lal, S.; Chintala, C.S.Haze is a common atmospheric disturbance that adversely affects the quality of optical data, thus often restricting their usability. Since these effects are inherent in the process of spaceborne Earth sensing, it is important to develop effective methods to remove them. This work proposes a novel method for de-hazing satellite imagery and outdoor camera images. It is developed by modifying the transmission map used in Dark Channel Prior (DCP) method. A Weighted Variance Guided Filter (WVGF) is introduced for enhancing the image quality, which included a two-stage image decomposition and fusion process. The method also optimally combines the radiance and transmission components along with an additional stage modelling a fusion-based transparency function. A final guided filter-based image refinement scheme is incorporated to improve the processed image quality. The optimal tuning of the image-dependent parameters at various stages is achieved using the newly proposed Adaptive Black Widow Optimization (ABWO) algorithm, which makes the proposed de-hazing scheme fully automatic. Qualitative and quantitative performance analyses, and the results are compared with other state-of-the-art methods. The experimental results reveal that the proposed method performs better as compared with others, independent of the haze density, without losing the natural look of the scene. © 2021 Informa UK Limited, trading as Taylor & Francis Group.Item Enhanced JAYA optimization based medical image fusion in adaptive non subsampled shearlet transform domain(Elsevier B.V., 2022) Suresh, S.; Rajan, M.; Asha, C.S.; Shyam, L.Multi-modal image fusion has gained popularity in the medical field as it assists doctors to view the diverse medical image modalities in a single image. The treatment is effectively planned by looking into the fused image that helps doctors diagnose diseases. The medical image fusion aims to merge the texture features from multiple images in a single image. The proposed method includes the application of Adaptive window-based Non-Subsampled Shearlet Transform (ANSST) on source images to separate the low and high-frequency directional sub-bands. Further, an enhanced JAYA (EJAYA) optimization framework is utilized to obtain the adaptive weights for combining high-frequency sub-bands for a multi-modal medical image fusion. The low-frequency bands are fused using the max rule based on the average energy of low-frequency sub-bands. The entire process focuses on preserving the low-frequency band's energy while improving the texture details in the combined image. In the end, inverse ANSST is applied on merged low-frequency and high-frequency components to get the fused image. Extensive experiments are conducted on data sets obtained from the Brain Atlas website comprising more than 100 images. The significance of the current approach is validated by qualitative and quantitative assessments. The proposed method exhibits good performance in terms of subjective analysis compared to the recent well-known image fusion techniques. © 2022 Karabuk University
