Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 6 of 6

Contribution of Telugu vowels in identifying emotions
(Institute of Electrical and Electronics Engineers Inc., 2015) Shashidhar Koolagudi, G.; Shivakranthi, B.; Sreenivasa Rao, K.S.; Ramteke, P.B.
This work is mainly intended at identifying emotion contribution of different vowels in Telugu language. Instead of processing the entire speech signal we propose to focus only vowel parts of the utterance (/a/, /i/, /u/, /e/ and /o/). By analysing the vowels we can discriminate the emotions. In this work spectral and prosodic features are used for studying the effect of emotions on different vowels. Even though prosodic features are best discriminators of emotions at utterance level, at phoneme level spectral features are more useful. One may observe that same vowel exhibits different spectral behaviour when expressed in different emotions. Shimmer and jitter play a crucial role for classifying emotions using vowels. A semi natural database used in this work is collected from Telugu movies. Gaussian Mixture Models (GMMs) are used as the mathematical models for classification. Emotions considered for this work are anger, fear, happy, sad and neutral. Average emotion recognition performance obtained by combining MFCCs, formants, intensity, shimmer and jitter is around 78%. Â© 2015 IEEE.
Classification of vocal and non-vocal regions from audio songs using spectral features and pitch variations
(Institute of Electrical and Electronics Engineers Inc., 2015) Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, S.G.
In this work, an effort has been made to identify vocal and non-vocal regions from a given song using signal processing techniques and machine learning algorithm. Initially spectral features like mel-frequency cepstral coefficients (MFCCs) are used to develop the baseline system. Statistical values of pitch, jitter and shimmer are considered to improve performance of the system. Artificial neural networks (ANNs) are used to capture the characteristics of vocal and non-vocal segments of the songs. The experiment is conducted on 60 vocal and 60 non-vocal clips extracted from Telugu albums. 11-point moving window is used to ensure the continuity of vocal and non-vocal segments, thus improving the accuracy of system. With this approach system achieves 85.59% accuracy for vocal and 88.52% for non-vocal segment classification. Â© 2015 IEEE.
Repetition detection in stuttered speech
(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2016) Ramteke, P.B.; Koolagudi, S.G.; Afroz, F.
This paper mainly focuses on detection of repetitions in stuttered speech. The stuttered speech signal is divided into isolated units based on energy. Mel-frequency cepstrum coefficients (MFCCs), formants and shimmer are used as features for repetition recognition. These features are extracted from each isolated unit. Using Dynamic Time Warping (DTW) the features of each isolated unit are compared with those subsequent units within one second interval of speech. Based on the analysis of scores obtained from DTW a threshold is set, if the score is below the set threshold then the units are identified as repeated events. Twenty seven seconds of speech data used in this work, consists of 50 repetition events. The result shows that the combination of MFCCs, formants and shimmer can be used for the recognition of repetitions in stuttered speech. Out of 50 repetitions, 47 are correctly identified. Â© Springer India 2016.
Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition
(Springer New York LLC barbara.b.bertram@gsk.com, 2018) Koolagudi, S.G.; Vishnu Srinivasa Murthy, Y.V.S.; Bhaskar, S.P.
In this paper, the process of selecting a classifier based on the properties of dataset is designed since it is very difficult to experiment the data on n—number of classifiers. As a case study speech emotion recognition is considered. Different combinations of spectral and prosodic features relevant to emotions are explored. The best subset of the chosen set of features is recommended for each of the classifiers based on the properties of chosen dataset. Various statistical tests have been used to estimate the properties of dataset. The nature of dataset gives an idea to select the relevant classifier. To make it more precise, three other clustering and classification techniques such as K-means clustering, vector quantization and artificial neural networks are used for experimentation and results are compared with the selected classifier. Prosodic features like pitch, intensity, jitter, shimmer, spectral features such as mel frequency cepstral coefficients (MFCCs) and formants are considered in this work. Statistical parameters of prosody such as minimum, maximum, mean (?) and standard deviation (?) are extracted from speech and combined with basic spectral (MFCCs) features to get better performance. Five basic emotions namely anger, fear, happiness, neutral and sadness are considered. For analysing the performance of different datasets on different classifiers, content and speaker independent emotional data is used, collected from Telugu movies. Mean opinion score of fifty users is collected to label the emotional data. To make it more accurate, one of the benchmark IIT-Kharagpur emotional database is used to generalize the conclusions. © 2018, Springer Science+Business Media, LLC, part of Springer Nature.
RIS Assisted Triple-Hop RF-FSO Convergent With UWOC System
(Institute of Electrical and Electronics Engineers Inc., 2022) Bhargava Kumar, L.B.; Naik, R.P.; Krishnan, P.; Raj, A.A.B.; Majumdar, A.K.; Chung, W.-Y.
The convergence of wireless optical communication (WOC) and radio-frequency (RF) systems is a promising technology that overcomes the shortcomings of standalone communication systems. By incorporating reconfigurable intelligent surfaces (RISs) on top of these WOC and RF communication systems, it is possible to circumvent the connection challenges associated with standard line of sight (LOS) communication links. Wireless communication systems with RIS assistance are a promising and evolving technology that enables more efficient and reliable link performance over long distances. The performance of the triple-hop RIS-assisted RF-FSO convergent with the underwater wireless optical communication (UWOC) system is investigated in this article. We considered the fading channel Nakagami-m over the RIS-RF connection and the fading channel Gamma-Gamma (GG) over the RIS-FSO and UWOC links. Then, the average bit error rate (ABER) and outage probability are determined using closed-form expressions. The ABER and outage probability performances of the triple-hop communication system is analysed by varying parameters such as turbulence, misalignment fading, and the number of RIS elements. The obtained results demonstrate an improvement in performance for low turbulence, low pointing error, and an increasing number of RIS elements. Additionally, the data demonstrate the accuracy of the analytical results. © 2013 IEEE.
Low Power, High Speed, Inductor-less Cascaded Charge Pump Phase Locked Loop
(Birkhauser, 2025) Kirankumar, H.L.; Rekha, S.; Laxminidhi, T.
A wide frequency range, inductor-less, charge pump phase locked loop (CP-PLL) is presented in this paper. It has a multi-phase, two stage cascaded architecture. This design uses a dead-zone free, zero blind-zone phase frequency detector (PFD) and a low mismatch charge pump (CP) circuit to generate low jitter clocks. A 3-stage single ended ring oscillator of 625 MHz VCO is designed for the first stage. An 8-phase feed-forward coupled VCO with programmable multi band ranging from 1.25 to 5 GHz is designed for the second stage of this cascaded system. Overall, this proposed cascaded PLL achieves jitter FOM and jitter-N FOM of -227.1 and ? 250.1 dB, respectively for 5 GHz output frequency with 1.44 ps rms jitter while consuming 9.24 mW of power from 1.2 V supply. This proposed clock generator circuit, designed in UMC 65 nm CMOS technology, occupies an area of 0.079 mm2. This study contributes to the development of energy-efficient, high speed clock generation solutions derived from a low reference clock. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results