Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 10 of 25

Contribution of Telugu vowels in identifying emotions
(Institute of Electrical and Electronics Engineers Inc., 2015) Shashidhar Koolagudi, G.; Shivakranthi, B.; Sreenivasa Rao, K.S.; Ramteke, P.B.
This work is mainly intended at identifying emotion contribution of different vowels in Telugu language. Instead of processing the entire speech signal we propose to focus only vowel parts of the utterance (/a/, /i/, /u/, /e/ and /o/). By analysing the vowels we can discriminate the emotions. In this work spectral and prosodic features are used for studying the effect of emotions on different vowels. Even though prosodic features are best discriminators of emotions at utterance level, at phoneme level spectral features are more useful. One may observe that same vowel exhibits different spectral behaviour when expressed in different emotions. Shimmer and jitter play a crucial role for classifying emotions using vowels. A semi natural database used in this work is collected from Telugu movies. Gaussian Mixture Models (GMMs) are used as the mathematical models for classification. Emotions considered for this work are anger, fear, happy, sad and neutral. Average emotion recognition performance obtained by combining MFCCs, formants, intensity, shimmer and jitter is around 78%. Â© 2015 IEEE.
Feature analysis for mispronounced phonemes in the case of alvoelar approximant (/r/) substituted with voiced dental consonant (/âˆ‚/)
(Institute of Electrical and Electronics Engineers Inc., 2015) Ramteke, P.B.; Koolagudi, S.G.; Prabhakar, A.
Mispronunciation is commonly observed in children from age 2 to 8 years. Some of the common mispronunciations are stopping, fronting, backing and affrication. These processes are known as phonological processes. Identification of these processes is crucial in studying the vocal tract development pattern and treating the phonological disorders in children. The features that clearly discriminate correctly pronounced phoneme from corresponding mispronounced phoneme have to be compared to identify the phonological processes. This paper focuses on the analysis of mispronounced alveolar approximant (/r/) substituted with voiced fricative consonant (/âˆ‚/). In this work, spectral and pitch related features are considered for the analysis using scatter plots and histograms. From the analysis, it is observed that the energy feature against 2nd and 4th cepstral coefficients achieves 75% and 65% discrimination respectively. Â© 2015 IEEE.
Recognition of repetition and prolongation in stuttered speech using ANN
(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2016) Savin, P.S.; Ramteke, P.B.; Koolagudi, S.G.
This paper mainly focuses on repetition and prolongation detection in stuttered speech signal. The acoustic and pitch related features like Mel-frequency cepstral coefficients (MFCCs), formants, pitch, zero crossing rate (ZCR) and Energy are used to test the effectiveness in recognizing repetitions and prolongations in stammered speech. Artificial Neural Networks (ANN) are used as classifier. The results are evaluated using combination of different features. The results show that the ANN classifier trained using MFCC features achieves an average accuracy of 87.39% for repetition and prolongation recognition. Â© Springer India 2016.
Repetition detection in stuttered speech
(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2016) Ramteke, P.B.; Koolagudi, S.G.; Afroz, F.
This paper mainly focuses on detection of repetitions in stuttered speech. The stuttered speech signal is divided into isolated units based on energy. Mel-frequency cepstrum coefficients (MFCCs), formants and shimmer are used as features for repetition recognition. These features are extracted from each isolated unit. Using Dynamic Time Warping (DTW) the features of each isolated unit are compared with those subsequent units within one second interval of speech. Based on the analysis of scores obtained from DTW a threshold is set, if the score is below the set threshold then the units are identified as repeated events. Twenty seven seconds of speech data used in this work, consists of 50 repetition events. The result shows that the combination of MFCCs, formants and shimmer can be used for the recognition of repetitions in stuttered speech. Out of 50 repetitions, 47 are correctly identified. Â© Springer India 2016.
Efficient audio segmentation in soccer videos
(Institute of Electrical and Electronics Engineers Inc., 2016) Raghuram, M.A.; Chavan, N.R.; Koolagudi, S.G.; Ramteke, P.B.
Identifying different audio segments in videos is the first step for many important tasks such as event detection and speech transcription. Approaches using Mel-Frequency Cepstral coefficients (MFCCs) with Gaussian mixture models (GMMs) and hidden Markov models (HMMs) perform reasonably well in stationary conditions but do not scale to a broad range of environmental conditions. This paper focuses on the audio segmentation in broadcast soccer videos into audio classes such as Silence, Speech Only, Speech Over Crowd, Crowd Only and Excited, with an alternative feature set which is simplistic as well as robust to changes in the environment conditions. Support Vector Machines (SVMs), Neural Networks and Random Forest are used for the classification. The accuracy achieved with SVMs, Neural Networks and Random Forest are 83.80%, 86.07%, and 88.35% respectively. The proposed features and Random Forest classifier are found to achieve better accuracy compared to the other classifiers. Â© 2016 IEEE.
Text-independent automatic accent identification system for Kannada language
(Springer Verlag service@springer.de, 2017) Soorajkumar, R.; Girish, G.N.; Ramteke, P.B.; Joshi, S.S.; Koolagudi, S.G.
Accent identification is one of the applications paid more attention in speech processing.Atext-independent accent identification system is proposed using Gaussian mixturemodels (GMMs) for Kannada language. Spectral and prosodic features such as Mel-frequency cepstral coefficients (MFCCs), pitch, and energy are considered for the experimentation. The dataset is collected from three regions of Karnataka namely Mumbai Karnataka, Mysore Karnataka, and Karavali Karnataka having significant variations in accent. Experiments are conducted using 32 speech samples from each region where each clip is of one minute duration spoken by native speakers. The baseline system implemented using MFCC features found to achieve 76.7% accuracy. From the results it is observed that the hybrid features improve the performance of the system by 3 %. Â© Springer Science+Business Media Singapore 2017.
Selective cropper for geometrical objects in openflipper
(Springer Verlag service@springer.de, 2017) Maonica, B.; Das, P.; Ramteke, P.B.; Koolagudi, S.G.
Computer graphics remains one of the most exciting and rapidly growing computer fields. It includes geometry processing as a major part of it. Every element in Graphics can be processed using different algorithms for acquisition, reconstruction, analysis, manipulation, simulation, and transition of simple, primitive, and complex structures. One such commonly used function is cropping/clipping of geometrical objects. In this paper, an approach has been proposed for cropping a 3D object. This algorithm allows users to crop out selective portions of geometrical objects based on certain constraints like the axis, position, and amount to be cropped. The proposed algorithm has been provided as a plugin to the open-source software OpenFlipper and the results of the crop algorithm have been presented. Â© Springer Science+Business Media Singapore 2017.
Identification of Voicing Assimilation From Childrenâ€™s Speech
(Institute of Electrical and Electronics Engineers Inc., 2017) Ramteke, P.B.; Madugula, M.; Suresh, S.; Koolagudi, S.G.
In this paper, an attempt has been made for the automatic identification of the voicing assimilation or harmony process. In these processes the voiced sounds are replaced by unvoiced sounds and vice versa. The phonological processes appear in the children represent the age wise speech learning ability, where the processes start to disappear as children grow. Speech Language Pathologists (SLPs) analyse these processes to evaluate the learning ability of the children. The pitch is present in voiced speech and absent in unvoiced region of speech. This gives the clear view of the assimilation; hence pitch is explored for the identification of voicing assimilation. Features extracted from the test words (mispronounced words) are compared with the reference/correct words using Dynamic Time Warping (DTW) and region of mispronunciation is identified from the properties of DTW curve. The highest accuracy of identifying voicing assimilation achieved using pitch feature is 88%. Â© INDIACom-2017.
Ensuring performance of graphics processing units: A programmerâ€™s perspective
(Springer Verlag service@springer.de, 2017) Varshney, M.; Koolagudi, S.G.; Velusamy, S.; Ramteke, P.B.
This paper mainly focuses on the usage of automation system for ensuring the performance of graphics driver created at Intel Corporation. This automation tool takes into account a client-server structural planning which can be utilized by the developers or the validation engineers so as to guarantee whether the graphics drivers are programmed and modified accurately or not. The tool additionally actualizes some of the Driver Private APIs (it allows any application to talk directly with the driver) which will guarantee the properties of the features which are not bolstered by the Operating System (OS). Â© Springer Science+Business Media Singapore 2017.
Video Stabilization Using Sliding Frame Window
(Springer Verlag service@springer.de, 2017) Shagrithaya, K.S.; Gurushankar, E.; Srikanth, D.; Ramteke, P.B.; Koolagudi, S.G.
Shaky videos are visually unappealing to viewers. Digital video stabilization is a technique to compensate for unwanted camera motion and produce a video that looks relatively stable. In this paper, an approach for video stabilization is proposed which works by estimating a trajectory built by calculating motion between continuous frames using the Shi-Tomasi Corner Detection and Optical Flow algorithms for the entire length of the video. The trajectory is then smoothed using a moving average to give a stabilized output. A smoothing radius is defined, which determines the smoothness of the resulting video. Automatically deciding this parameterâ€™s value is also discussed. The results of stabilization of the proposed approach are observed to be comparable with the state of the art YouTube stabilization. Â© 2017, Springer International Publishing AG.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results