Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 10 of 54

Characterization of Consonant Sounds Using Features Related to Place of Articulation
(Springer, 2020) Ramteke, P.B.; Hegde, S.; Koolagudi, S.G.
Speech soundsÂ are classified into 5 classes, grouped based on placeÂ and manner of articulation: velar, palatal, retroflex, dentalÂ and labial. In this paper, an attempt has been made to explore the role of place of articulation and vocal tract length in characterizing the different class of speech sounds. Formants and vocal tract length available for the production of each class of sound are extracted from the region of transition from consonant burst to the rising profile of the immediate following vowel. These features along with their statistical variations are considered for the analysis. Based on the non-linear nature of the features Random Forest (RF) is used for the classification. From the results, it is observed that the proposed features are efficient in discriminating the class of consonants: velar and palatal, palatal and retroflex and palatal and labial sounds with an accuracy of 92.9%, 93.83 and 94.07 respectively. Â© 2020, Springer Nature Singapore Pte Ltd.
Objective Assessment of Pitch Accuracy in Equal-Tempered Vocal Music Using Signal Processing Approaches
(Springer, 2020) Biswas, R.; Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, S.G.; Vishnu, S.G.
This paper presentsÂ an approach for assessing the pitch in vocal monophonic music objectively using variousÂ signal processing techniques. A databaseÂ has been collected with 250 recordings containingÂ both arohan and avarohan patterns rendered by 25 different singers for 10 Hindustani classical ragas. The fundamental frequency (F0) values of the user renditions are estimated and analyzed with the original pitch values to quantify the level of variations in pitch initially the five-point moving window has been considered to smoothen the contour. Later, first order and second order differential techniques are applied to estimate the note onset. This process is computationally economical when compared with the available approaches. The technique of cents has been used to evaluate the variation among the target and singing pitch as cent is a unit of the most common tuning system for quantifying intonation in equal tempered music. From this analysis, it is observed that singers with professional training have deviations within 15â€“20 cents, and non-musicians have deviations above 50 cents. Five expert singers rated the global pitch accuracy from the recordings and these results were found to exhibit high correlation with the systemâ€™s assessments. Such an evaluation system with quantitative analysis coupled with visual representation will greatly aid the training process of singers. Â© 2020, Springer Nature Singapore Pte Ltd.
Sentence-Based Dialect Identification System Using Extreme Gradient Boosting Algorithm
(Springer, 2020) Chittaragi, N.B.; Koolagudi, S.G.
In this paper,Â a dialect identification system (DIS) is proposed by exploring the dialect specific prosodic features and cepstral coefficients from sentence-levelÂ utterances. Commonly, people belonging to a specific region follow a unique speaking style among them known as dialects. Sentence speech units are chosen for dialect identification since it is observed that a unique intonation and energy patterns are followed in sentences. Sentences are derived from a standard Intonational Variations in English (IViE) speech dataset. In this paper, pitch and energy contour are used to derive intonation and energy features respectively by using Legendre polynomial fit function along with five statistical features. Further, Mel frequency cepstral coefficients (MFCCs) are added to capture dialect specific spectral information. Extreme Gradient Boosting (XGB) ensemble method is employed for evaluation of the system under individual and combinations of features. Obtained results have indicated the influences of both prosodic and spectral features in recognition of dialects, also combined feature vectors have shown a better DIS performance of about 89.6%. Â© 2020, Springer Nature Singapore Pte Ltd.
Note Transcription from Carnatic Music
(Springer, 2020) Suma, S.M.; Koolagudi, S.G.; Ramteke, P.B.; Sreenivasa Rao, K.S.
In this work, anÂ effort has been made to identify note sequence ofÂ different ragas of Carnatic Music. The proposedÂ heuristic method makes use of standard just-intonationÂ frequency ratios between notes for basic transcription of music piece into written sequence of notes. The notes present in a given piece of music are obtained using pitch histograms. The normalized pitch contour of the music piece is segmented based on detection of the note boundaries. These segments are labeled using note information already available. Without prior knowledge of raga, 30 out of 64 sequences are identified accurately and additional 18 sequences are identified with one note error. With the prior raga knowledge 76.56% accuracy is observed in note sequence identification. Â© 2020, Springer Nature Singapore Pte Ltd.
Estimation of Tyre Pressure from the Characteristics of the Wheel: An Image Processing Approach
(Springer, 2020) Vineeth Reddy, V.B.; Ananda Rao, H.; Yeshwanth, A.; Ramteke, P.B.; Koolagudi, S.G.
Improper tyre pressure is a safety issue that falls prey to ignorance of users.Â But a drop in tyre pressure can result in theÂ reduction of mileage, tyre life, vehicle safety and performance. In this paper, an approach is proposedÂ to measure the tyre pressure from the image of the wheel. The tyre pressure is classified into under pressure and normal pressure using load index, tyre type, tyre position and ratio of compressed and uncompressed tyre radius. The efficiency of the feature is evaluated using three classifiers namely Random Forest, AdaBoost and Artificial Neural Networks. It is observed that the ratio of radii plays a major role in classifying the tyres. The proposed system can be used to obtain a rough idea on whether the tyre should be refilled or not. Â© 2020, Springer Nature Singapore Pte Ltd.
Image Colorization Using GANs and Perceptual Loss
(Institute of Electrical and Electronics Engineers Inc., 2020) Sankar, R.; Nair, A.; Abhinav, P.; Mothukuri, S.K.P.; Koolagudi, S.G.
Image colorization is of great use for several applications, such as the restoration of old images, as well as enabling the storage of grayscale images, which take up less space, which can later be colorized. But this problem is hard since there exist many possible color combinations for a particular grayscale image. Recent developments have aimed to solve this problem using deep learning. But, for achieving good performance, they require highly processed inputs, along with additional elements, such as semantic maps. In this paper, an attempt has been made for generalizing the procedure of colorization using a conditional Deep Convolutional Generative Adversarial Network (DCGAN) by adding "Perceptual Loss". The network is trained over the CIFAR-100 dataset. The results of the proposed generative model with perceptual loss are compared with the existing state-of-the-art systems normal GAN model and U-Net Convolutional model. Â© 2020 IEEE.
Kannada Dialect Classification using Artificial Neural Networks
(Institute of Electrical and Electronics Engineers Inc., 2020) Mothukuri, S.K.P.; Hegde, P.; Chittaragi, N.B.; Koolagudi, S.G.
In this paper, Automatic Dialect Classification (ADC) system is proposed for dialects of Kannada language (the Dravidian language spoken in Southern Karnataka). ADC system is proposed by extracting spectral Mel Frequency Cepstral Coefficients (MFCCs), and log filter bank features along with Linear predictive coefficients. In addition, prosodic pitch and energy features are extracted to capture dialect specific cues. A Kannada dialect speech corpus consisting of five prominent dialects of Kannada language is used for designing the ADC system. An attempt is made by using Artificial Neural Networks (ANNs) technique for classification of Kannada dialects. As, recently, ANNs and its variants are gaining more popularity in the area of speech processing application. Hyperparameter tuning of ANN has resulted with an increase in performance. Â© 2020 IEEE.
Efficient Traffic Signboard Recognition System Using Convolutional Networks
(Springer, 2020) Mothukuri, S.K.P.; Tejas, R.; Patil, S.; Darshan, V.; Koolagudi, S.G.
In this paper, a smart automatic traffic sign recognition system is proposed. This signboard recognition system plays a vital role in the automated driving system of transport vehicles. The model is built based on convolutional neural network. The German Traffic Sign Detection Benchmark (GTSDB), a standard open-source segmented image dataset with forty-three different signboard classes is considered for experimentation. Implementation of the system is highly focused on processing speed and classification accuracy. These aspects are concentrated, such that the built model is suitable for real-time automated driving systems. Similar experiments are carried in comparison with the pre-trained convolution models. The performance of the proposed model is better in the aspects of fast responsive time. Â© Springer Nature Singapore Pte Ltd. 2020.
Retinal-Layer Segmentation Using Dilated Convolutions
(Springer Science and Business Media Deutschland GmbH, 2020) Guru Pradeep Reddy, T.; Ashritha, K.S.; Prajwala, T.M.; Girish, G.N.; Kothari, A.R.; Koolagudi, S.G.; Rajan, J.
Visualization and analysis of Spectral Domain Optical Coherence Tomography (SD-OCT) cross-sectional scans has gained a lot of importance in the diagnosis of several retinal abnormalities. Quantitative analytic techniques like retinal thickness and volumetric analysis are performed on cross-sectional images of the retina for early diagnosis and prognosis of retinal diseases. However, segmentation of retinal layers from OCT images is a complicated task on account of certain factors like speckle noise, low image contrast and low signal-to-noise ratio amongst many others. Owing to the importance of retinal layer segmentation in diagnosing ophthalmic diseases, manual segmentation techniques have been proposed and adopted in clinical practice. Nonetheless, manual segmentations suffer from erroneous boundary detection issues. This paper thus proposes a fully automated semantic segmentation technique that uses an encoderâ€“decoder architecture to accurately segment the prominent retinal layers. Â© 2020, Springer Nature Singapore Pte Ltd.
Kannada Dialect Classification UsingÂ CNN
(Springer Science and Business Media Deutschland GmbH, 2020) Hegde, P.; Chittaragi, N.B.; Mothukuri, S.K.P.; Koolagudi, S.G.
Kannada is one of the prominent languages spoken in southern India. Since the Kannada is a lingua franca and spoken by more than 70 million people, it is evident to have dialects. In this paper, we identified five major dialectal regions in Karnataka state. An attempt is made to classify these five dialects from sentence-level utterances. Sentences are segmented from continuous speech automatically by using spectral centroid and short term energy features. Mel frequency cepstral coefficient (MFCC) features are extracted from these sentence units. These features are used to train the convolutional neural networks (CNN). Along with MFCCs, shifted delta and double delta coefficients are also attempted to train the CNN model. The proposed CNN based dialect recognition system is also tested with internationally known standard Intonation Variation in English (IViE) dataset. The CNN model has resulted in better performance. It is observed that the use of one convolution layer and three fully connected layers balances computational complexity and results in better accuracy with both Kannada and English datasets. Â© 2020, Springer Nature Switzerland AG.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results