Browsing by Author "Ramteke, P.B."

Now showing 1 - 20 of 45

A Transfer Learning Approach for Diabetic Retinopathy Classification Using Deep Convolutional Neural Networks
(Institute of Electrical and Electronics Engineers Inc., 2018) Krishnan, A.S.; Clive, D.R.; Bhat, V.; Ramteke, P.B.; Koolagudi, S.G.
Diabetic Retinopathy is a disease in which the retina is damaged due to diabetes mellitus. It is a leading cause for blindness today. Detection and quantification of such mellitus from retinal images is tedious and requires expertise. In this paper, an automatic identification of severity of Diabetic Retinopathy using Convolutional Neural Networks (CNNs) with a transfer learning approach has been proposed to aid the diagnostic process. A comparison of different CNN architectures such as ResNet, Inception-ResNet-v2 etc. is done using the quadratic weighted kappa metric. The qualitative and quantitative evaluation of the proposed approach is carried out on the Diabetic Retinopathy detection dataset from Kaggle. From the results, we observe that the proposed model achieves a kappa score of 0.76. Â© 2018 IEEE.
Automated Evaluation of Attendance and Cumulative Feedback using Face Recognition
(Institute of Electrical and Electronics Engineers Inc., 2018) Shalini, S.; Navya, R.S.; Neha, M.; Ramteke, P.B.; Koolagudi, S.G.
Face recognition is an important technological development of this era. It is being widely used in biometric systems, gaming as well as to tag people on social media. It is also being used for attendance because the manual system is tedious and time-consuming. This paper proposes an automated attendance and cumulative feedback system based on facial expression recognition. The proposed automation system recognizes students from a recorded video of the class and captures their attendance. Local Binary Pattern Histograms (LBPH) and Eigen Face recognizers have been used for face recognition with an accuracy of 97% and 95% respectively. This paper addresses another issue of feedback of the professor by deducing genuine and cumulative feedback based on facial expressions of the students. Two methods have been proposed for deducing the feedback. One is the algorithmic method based on face recognition using confidence measure for expressions detection and the other one uses Speeded up robust features (SURF) and Support Vector Machines(SVM). The proposed methodology is observed to be in correlation with the conventional method of feedback evaluation. Copy Right Â© INDIACom-2018.
Characterization of aspirated and unaspirated sounds in speech
(2017) Ramteke, P.B.; Sadanand, A.; Koolagudi, S.G.; Pai, V.
In this work, consonant aspiration and unaspiration phenomena are studied. It is known that, pronunciation of aspiration and unaspiration is characterized by the 'puff of air' released at the place of constriction in the vocal tract which is known as burst. Here, the properties of vowel immediately after the burst are studied for characterization of the burst. Excitation source signal estimated from the speech linear prediction residual is used for the task. The signal characteristics such as glottal pulse, duration of open, closed & return phases, slope of open & return phases, duration of burst, ratio of highest and lowest energies of signal and voice onset time (VOT) are explored to characterize aspiration and unaspiration. TIMIT English speech corpus is used to test the proposed approach. Random forest (RF) and support vector machine (SVMs) are used as classifiers to test the effectiveness of the features used for the task. An accuracy of 99.93% and 94.03% is achieved respectively. From the results, it is observed that the proposed features are robust in classifying the aspirated and unaspirated consonants. � 2017 IEEE.
Characterization of aspirated and unaspirated sounds in speech
(Institute of Electrical and Electronics Engineers Inc., 2017) Ramteke, P.B.; Sadanand, A.; Koolagudi, S.G.; Pai, V.
In this work, consonant aspiration and unaspiration phenomena are studied. It is known that, pronunciation of aspiration and unaspiration is characterized by the 'puff of air' released at the place of constriction in the vocal tract which is known as burst. Here, the properties of vowel immediately after the burst are studied for characterization of the burst. Excitation source signal estimated from the speech linear prediction residual is used for the task. The signal characteristics such as glottal pulse, duration of open, closed & return phases, slope of open & return phases, duration of burst, ratio of highest and lowest energies of signal and voice onset time (VOT) are explored to characterize aspiration and unaspiration. TIMIT English speech corpus is used to test the proposed approach. Random forest (RF) and support vector machine (SVMs) are used as classifiers to test the effectiveness of the features used for the task. An accuracy of 99.93% and 94.03% is achieved respectively. From the results, it is observed that the proposed features are robust in classifying the aspirated and unaspirated consonants. Â© 2017 IEEE.
Characterization of Consonant Sounds Using Features Related to Place of Articulation
(2020) Ramteke, P.B.; Hegde, S.; Koolagudi, S.G.
Speech sounds�are classified into 5 classes, grouped based on place�and manner of articulation: velar, palatal, retroflex, dental�and labial. In this paper, an attempt has been made to explore the role of place of articulation and vocal tract length in characterizing the different class of speech sounds. Formants and vocal tract length available for the production of each class of sound are extracted from the region of transition from consonant burst to the rising profile of the immediate following vowel. These features along with their statistical variations are considered for the analysis. Based on the non-linear nature of the features Random Forest (RF) is used for the classification. From the results, it is observed that the proposed features are efficient in discriminating the class of consonants: velar and palatal, palatal and retroflex and palatal and labial sounds with an accuracy of 92.9%, 93.83 and 94.07 respectively. � 2020, Springer Nature Singapore Pte Ltd.
Characterization of Consonant Sounds Using Features Related to Place of Articulation
(Springer, 2020) Ramteke, P.B.; Hegde, S.; Koolagudi, S.G.
Speech soundsÂ are classified into 5 classes, grouped based on placeÂ and manner of articulation: velar, palatal, retroflex, dentalÂ and labial. In this paper, an attempt has been made to explore the role of place of articulation and vocal tract length in characterizing the different class of speech sounds. Formants and vocal tract length available for the production of each class of sound are extracted from the region of transition from consonant burst to the rising profile of the immediate following vowel. These features along with their statistical variations are considered for the analysis. Based on the non-linear nature of the features Random Forest (RF) is used for the classification. From the results, it is observed that the proposed features are efficient in discriminating the class of consonants: velar and palatal, palatal and retroflex and palatal and labial sounds with an accuracy of 92.9%, 93.83 and 94.07 respectively. Â© 2020, Springer Nature Singapore Pte Ltd.
Classification of aspirated and unaspirated sounds in speech using excitation and signal level information
(Academic Press, 2020) Ramteke, P.B.; Supanekar, S.; Koolagudi, S.G.
In this work, consonant aspiration and unaspiration phenomena are studied. It is known that, pronunciation of aspiration and unaspiration is characterized by the ’puff of air’ released at the place of constriction in the vocal tract also known as burst. Here, properties of the vowel immediately after the burst are studied for characterization of the burst. Excitation source signal estimated from speech as low pass filtered linear prediction residual signal is used for the task. The signal characteristics of parameters such as glottal pulse, duration of open, closed & return phases; slope of open, & return phases; duration of burst; ratio of highest and lowest frame wise energies of signal and voice onset point are explored as features to characterize aspiration and unaspiration. Three datasets namely TIMIT, IIIT Hyderabad Marathi and IIIT Hyderabad Hindi (IIIT-H Indic Speech Databases) are used to verify the proposed approach. Random forest, support vector machine and deep feed forward neural networks (DFFNNs) are used as classifiers to test the effectiveness of the features used for the task. Optimal features are selected for the classification using correlation based feature selection (CFS). From the results, it is observed that the proposed features are efficient in classifying the aspirated and unaspirated consonants. Performance of the proposed features in recognition of aspirated and unaspirated phoneme is also evaluated. IIIT Hyderabad Marathi is considered for the analysis. It is observed that the performance of recognition of aspirated and unaspirated sounds using proposed features is improved in comparison with the MFCCs based phoneme recognition system. © 2020 Elsevier Ltd
Contribution of Telugu vowels in identifying emotions
(2015) Koolagudi, S.G.; Shivakranthi, B.; Rao, K.S.; Ramteke, P.B.
This work is mainly intended at identifying emotion contribution of different vowels in Telugu language. Instead of processing the entire speech signal we propose to focus only vowel parts of the utterance (/a/, /i/, /u/, /e/ and /o/). By analysing the vowels we can discriminate the emotions. In this work spectral and prosodic features are used for studying the effect of emotions on different vowels. Even though prosodic features are best discriminators of emotions at utterance level, at phoneme level spectral features are more useful. One may observe that same vowel exhibits different spectral behaviour when expressed in different emotions. Shimmer and jitter play a crucial role for classifying emotions using vowels. A semi natural database used in this work is collected from Telugu movies. Gaussian Mixture Models (GMMs) are used as the mathematical models for classification. Emotions considered for this work are anger, fear, happy, sad and neutral. Average emotion recognition performance obtained by combining MFCCs, formants, intensity, shimmer and jitter is around 78%. � 2015 IEEE.
Contribution of Telugu vowels in identifying emotions
(Institute of Electrical and Electronics Engineers Inc., 2015) Shashidhar Koolagudi, G.; Shivakranthi, B.; Sreenivasa Rao, K.S.; Ramteke, P.B.
This work is mainly intended at identifying emotion contribution of different vowels in Telugu language. Instead of processing the entire speech signal we propose to focus only vowel parts of the utterance (/a/, /i/, /u/, /e/ and /o/). By analysing the vowels we can discriminate the emotions. In this work spectral and prosodic features are used for studying the effect of emotions on different vowels. Even though prosodic features are best discriminators of emotions at utterance level, at phoneme level spectral features are more useful. One may observe that same vowel exhibits different spectral behaviour when expressed in different emotions. Shimmer and jitter play a crucial role for classifying emotions using vowels. A semi natural database used in this work is collected from Telugu movies. Gaussian Mixture Models (GMMs) are used as the mathematical models for classification. Emotions considered for this work are anger, fear, happy, sad and neutral. Average emotion recognition performance obtained by combining MFCCs, formants, intensity, shimmer and jitter is around 78%. Â© 2015 IEEE.
Efficient audio segmentation in soccer videos
(2016) Raghuram, M.A.; Chavan, N.R.; Koolagudi, S.G.; Ramteke, P.B.
Identifying different audio segments in videos is the first step for many important tasks such as event detection and speech transcription. Approaches using Mel-Frequency Cepstral coefficients (MFCCs) with Gaussian mixture models (GMMs) and hidden Markov models (HMMs) perform reasonably well in stationary conditions but do not scale to a broad range of environmental conditions. This paper focuses on the audio segmentation in broadcast soccer videos into audio classes such as Silence, Speech Only, Speech Over Crowd, Crowd Only and Excited, with an alternative feature set which is simplistic as well as robust to changes in the environment conditions. Support Vector Machines (SVMs), Neural Networks and Random Forest are used for the classification. The accuracy achieved with SVMs, Neural Networks and Random Forest are 83.80%, 86.07%, and 88.35% respectively. The proposed features and Random Forest classifier are found to achieve better accuracy compared to the other classifiers. � 2016 IEEE.
Efficient audio segmentation in soccer videos
(Institute of Electrical and Electronics Engineers Inc., 2016) Raghuram, M.A.; Chavan, N.R.; Koolagudi, S.G.; Ramteke, P.B.
Identifying different audio segments in videos is the first step for many important tasks such as event detection and speech transcription. Approaches using Mel-Frequency Cepstral coefficients (MFCCs) with Gaussian mixture models (GMMs) and hidden Markov models (HMMs) perform reasonably well in stationary conditions but do not scale to a broad range of environmental conditions. This paper focuses on the audio segmentation in broadcast soccer videos into audio classes such as Silence, Speech Only, Speech Over Crowd, Crowd Only and Excited, with an alternative feature set which is simplistic as well as robust to changes in the environment conditions. Support Vector Machines (SVMs), Neural Networks and Random Forest are used for the classification. The accuracy achieved with SVMs, Neural Networks and Random Forest are 83.80%, 86.07%, and 88.35% respectively. The proposed features and Random Forest classifier are found to achieve better accuracy compared to the other classifiers. Â© 2016 IEEE.
Ensuring performance of graphics processing units: A programmerâ€™s perspective
(Springer Verlag service@springer.de, 2017) Varshney, M.; Koolagudi, S.G.; Velusamy, S.; Ramteke, P.B.
This paper mainly focuses on the usage of automation system for ensuring the performance of graphics driver created at Intel Corporation. This automation tool takes into account a client-server structural planning which can be utilized by the developers or the validation engineers so as to guarantee whether the graphics drivers are programmed and modified accurately or not. The tool additionally actualizes some of the Driver Private APIs (it allows any application to talk directly with the driver) which will guarantee the properties of the features which are not bolstered by the Operating System (OS). Â© Springer Science+Business Media Singapore 2017.
Ensuring performance of graphics processing units: A programmer�s perspective
(2017) Varshney, M.; Koolagudi, S.G.; Velusamy, S.; Ramteke, P.B.
This paper mainly focuses on the usage of automation system for ensuring the performance of graphics driver created at Intel Corporation. This automation tool takes into account a client-server structural planning which can be utilized by the developers or the validation engineers so as to guarantee whether the graphics drivers are programmed and modified accurately or not. The tool additionally actualizes some of the Driver Private APIs (it allows any application to talk directly with the driver) which will guarantee the properties of the features which are not bolstered by the Operating System (OS). � Springer Science+Business Media Singapore 2017.
Estimation of Tyre Pressure from the Characteristics of the Wheel: An Image Processing Approach
(2020) Vineeth, Reddy, V.B.; Ananda, Rao, H.; Yeshwanth, A.; Ramteke, P.B.; Koolagudi, S.G.
Improper tyre pressure is a safety issue that falls prey to ignorance of users.�But a drop in tyre pressure can result in the�reduction of mileage, tyre life, vehicle safety and performance. In this paper, an approach is proposed�to measure the tyre pressure from the image of the wheel. The tyre pressure is classified into under pressure and normal pressure using load index, tyre type, tyre position and ratio of compressed and uncompressed tyre radius. The efficiency of the feature is evaluated using three classifiers namely Random Forest, AdaBoost and Artificial Neural Networks. It is observed that the ratio of radii plays a major role in classifying the tyres. The proposed system can be used to obtain a rough idea on whether the tyre should be refilled or not. � 2020, Springer Nature Singapore Pte Ltd.
Estimation of Tyre Pressure from the Characteristics of the Wheel: An Image Processing Approach
(Springer, 2020) Vineeth Reddy, V.B.; Ananda Rao, H.; Yeshwanth, A.; Ramteke, P.B.; Koolagudi, S.G.
Improper tyre pressure is a safety issue that falls prey to ignorance of users.Â But a drop in tyre pressure can result in theÂ reduction of mileage, tyre life, vehicle safety and performance. In this paper, an approach is proposedÂ to measure the tyre pressure from the image of the wheel. The tyre pressure is classified into under pressure and normal pressure using load index, tyre type, tyre position and ratio of compressed and uncompressed tyre radius. The efficiency of the feature is evaluated using three classifiers namely Random Forest, AdaBoost and Artificial Neural Networks. It is observed that the ratio of radii plays a major role in classifying the tyres. The proposed system can be used to obtain a rough idea on whether the tyre should be refilled or not. Â© 2020, Springer Nature Singapore Pte Ltd.
Feature analysis for mispronounced phonemes in the case of alvoelar approximant (/r/) substituted with voiced dental consonant (/?/)
(2015) Ramteke, P.B.; Koolagudi, S.G.; Prabhakar, A.
Mispronunciation is commonly observed in children from age 2 to 8 years. Some of the common mispronunciations are stopping, fronting, backing and affrication. These processes are known as phonological processes. Identification of these processes is crucial in studying the vocal tract development pattern and treating the phonological disorders in children. The features that clearly discriminate correctly pronounced phoneme from corresponding mispronounced phoneme have to be compared to identify the phonological processes. This paper focuses on the analysis of mispronounced alveolar approximant (/r/) substituted with voiced fricative consonant (/?/). In this work, spectral and pitch related features are considered for the analysis using scatter plots and histograms. From the analysis, it is observed that the energy feature against 2nd and 4th cepstral coefficients achieves 75% and 65% discrimination respectively. � 2015 IEEE.
Feature analysis for mispronounced phonemes in the case of alvoelar approximant (/r/) substituted with voiced dental consonant (/âˆ‚/)
(Institute of Electrical and Electronics Engineers Inc., 2015) Ramteke, P.B.; Koolagudi, S.G.; Prabhakar, A.
Mispronunciation is commonly observed in children from age 2 to 8 years. Some of the common mispronunciations are stopping, fronting, backing and affrication. These processes are known as phonological processes. Identification of these processes is crucial in studying the vocal tract development pattern and treating the phonological disorders in children. The features that clearly discriminate correctly pronounced phoneme from corresponding mispronounced phoneme have to be compared to identify the phonological processes. This paper focuses on the analysis of mispronounced alveolar approximant (/r/) substituted with voiced fricative consonant (/âˆ‚/). In this work, spectral and pitch related features are considered for the analysis using scatter plots and histograms. From the analysis, it is observed that the energy feature against 2nd and 4th cepstral coefficients achieves 75% and 65% discrimination respectively. Â© 2015 IEEE.
Gender Identification from Children's Speech
(Institute of Electrical and Electronics Engineers Inc., 2018) Ramteke, P.B.; Dixit, A.A.; Supanekar, S.; Dharwadkar, N.V.; Koolagudi, S.G.
Children's speech can be characterized by higher pitch and format frequencies compared to the adult speech. Gender identification task from children's speech is difficult as there is no significant difference in the acoustic properties of male and female child. Here, an attempt has been made to explore the features efficient in discriminating the gender from children's speech. Different combinations of spectral features such as Mel-frequency cepstral coefficients (MFCCs), Î”MFCCs and Î”Î”MFCCs, Formants, Linear predictive cepstral coefficients (LPCCs); Shimmer and Jitter; Prosodic features like pitch and its statistical variations along with Î”pitch related features are explored. Features are evaluated using non linear classifiers namely Artificial Neural Network (ANNs), Deep Neural Network (DNNs) and Random Forest (RF). From the results it is observed that the RF achieves an highest accuracy of 84.79% amongst the other classifiers. Â© 2018 IEEE.
Gender Identification using Spectral Features and Glottal Closure Instants (GCIs)
(2019) Ramteke, P.B.; Supanekar, S.; Koolagudi, S.G.
Automatic identification of gender from speech may help to improve the performance of the systems such as speaker speech recognition, forensic analysis, authentication processes. The difference in the physiological parameters of male and female vocal folds results in significant changes in their vocal fold vibration pattern. These changes can be characterized from the differences in the duration of their glottal closure. In this paper, an attempt has been made for gender recognition from speech using spectral features such as MFCCs, LPCCs, etc.; pitch (F0), excitation source features like glottal closure instants (GCIs) and its statistical variations. Western Michigan University's Gender dataset is used for experimentation. The dataset is collected from 93 speakers consisting of speech from 45 male and 48 female speakers respectively. Random forests (RFs) and Support vector machines (SVMs) are used to measure the performance of the proposed features. Random forest is observed to achieve average frame level accuracy of 96.908% using 13 MFCCs, 13 LPCCs, Pitch (F0) and GCI Stats (5). SVM is observed to achieve an average accuracy of 98.607% using 13 MFCCs, 13 LPCCs and GCI Stats (5). From the results, it is observed that the proposed features are efficient in discriminating the gender from speech. � 2019 IEEE.
Gender Identification using Spectral Features and Glottal Closure Instants (GCIs)
(Institute of Electrical and Electronics Engineers Inc., 2019) Ramteke, P.B.; Supanekar, S.; Koolagudi, S.G.
Automatic identification of gender from speech may help to improve the performance of the systems such as speaker speech recognition, forensic analysis, authentication processes. The difference in the physiological parameters of male and female vocal folds results in significant changes in their vocal fold vibration pattern. These changes can be characterized from the differences in the duration of their glottal closure. In this paper, an attempt has been made for gender recognition from speech using spectral features such as MFCCs, LPCCs, etc.; pitch (F0), excitation source features like glottal closure instants (GCIs) and its statistical variations. Western Michigan University's Gender dataset is used for experimentation. The dataset is collected from 93 speakers consisting of speech from 45 male and 48 female speakers respectively. Random forests (RFs) and Support vector machines (SVMs) are used to measure the performance of the proposed features. Random forest is observed to achieve average frame level accuracy of 96.908% using 13 MFCCs, 13 LPCCs, Pitch (F0) and GCI Stats (5). SVM is observed to achieve an average accuracy of 98.607% using 13 MFCCs, 13 LPCCs and GCI Stats (5). From the results, it is observed that the proposed features are efficient in discriminating the gender from speech. Â© 2019 IEEE.