Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 3 of 3
  • Item
    Classification of vocal and non-vocal regions from audio songs using spectral features and pitch variations
    (Institute of Electrical and Electronics Engineers Inc., 2015) Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, S.G.
    In this work, an effort has been made to identify vocal and non-vocal regions from a given song using signal processing techniques and machine learning algorithm. Initially spectral features like mel-frequency cepstral coefficients (MFCCs) are used to develop the baseline system. Statistical values of pitch, jitter and shimmer are considered to improve performance of the system. Artificial neural networks (ANNs) are used to capture the characteristics of vocal and non-vocal segments of the songs. The experiment is conducted on 60 vocal and 60 non-vocal clips extracted from Telugu albums. 11-point moving window is used to ensure the continuity of vocal and non-vocal segments, thus improving the accuracy of system. With this approach system achieves 85.59% accuracy for vocal and 88.52% for non-vocal segment classification. © 2015 IEEE.
  • Item
    Repetition detection in stuttered speech
    (Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2016) Ramteke, P.B.; Koolagudi, S.G.; Afroz, F.
    This paper mainly focuses on detection of repetitions in stuttered speech. The stuttered speech signal is divided into isolated units based on energy. Mel-frequency cepstrum coefficients (MFCCs), formants and shimmer are used as features for repetition recognition. These features are extracted from each isolated unit. Using Dynamic Time Warping (DTW) the features of each isolated unit are compared with those subsequent units within one second interval of speech. Based on the analysis of scores obtained from DTW a threshold is set, if the score is below the set threshold then the units are identified as repeated events. Twenty seven seconds of speech data used in this work, consists of 50 repetition events. The result shows that the combination of MFCCs, formants and shimmer can be used for the recognition of repetitions in stuttered speech. Out of 50 repetitions, 47 are correctly identified. © Springer India 2016.
  • Item
    Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition
    (Springer New York LLC barbara.b.bertram@gsk.com, 2018) Koolagudi, S.G.; Vishnu Srinivasa Murthy, Y.V.S.; Bhaskar, S.P.
    In this paper, the process of selecting a classifier based on the properties of dataset is designed since it is very difficult to experiment the data on n—number of classifiers. As a case study speech emotion recognition is considered. Different combinations of spectral and prosodic features relevant to emotions are explored. The best subset of the chosen set of features is recommended for each of the classifiers based on the properties of chosen dataset. Various statistical tests have been used to estimate the properties of dataset. The nature of dataset gives an idea to select the relevant classifier. To make it more precise, three other clustering and classification techniques such as K-means clustering, vector quantization and artificial neural networks are used for experimentation and results are compared with the selected classifier. Prosodic features like pitch, intensity, jitter, shimmer, spectral features such as mel frequency cepstral coefficients (MFCCs) and formants are considered in this work. Statistical parameters of prosody such as minimum, maximum, mean (?) and standard deviation (?) are extracted from speech and combined with basic spectral (MFCCs) features to get better performance. Five basic emotions namely anger, fear, happiness, neutral and sadness are considered. For analysing the performance of different datasets on different classifiers, content and speaker independent emotional data is used, collected from Telugu movies. Mean opinion score of fifty users is collected to label the emotional data. To make it more accurate, one of the benchmark IIT-Kharagpur emotional database is used to generalize the conclusions. © 2018, Springer Science+Business Media, LLC, part of Springer Nature.