Browsing by Author "Koolagudi, Shashidhar G."
Now showing 1 - 7 of 7
- Results Per Page
- Sort Options
Item Acoustic Scene Classification Using Speech Features(National Institute of Technology Karnataka, Surathkal, 2020) Mulimani, Manjunath.; Koolagudi, Shashidhar G.Currently, smart devices like smartphones, laptops, tablets, etc., need human intervention in the effective delivery of the services. They are capable of recognizing stuff like speech, music, images, characters and so on. To make smart systems behave as intelligent ones, we need to build a capacity in them, to understand and respond to the surrounding situation accordingly, without human intervention. Enabling the devices to sense the environment in which they are present through analysis of sound is the main objective of the Acoustic Scene Classification. The initial step in analyzing the surroundings is recognition of acoustic events present in day-to-day environment. Such acoustic events are broadly categorized into two types: monophonic and polyphonic. Monophonic acoustic events correspond to the non-overlapped events; in other words, at most one acoustic event is active in a given time. Polyphonic acoustic events correspond to the overlapped events; in other words, multiple acoustic events occur at the same time instance. In this work, we aim to develop the systems for automatic recognition of monophonic and polyphonic acoustic events along with corresponding acoustic scene. Applications of this research work include context-aware mobile devices, robots, intelligent monitoring systems, assistive technologies for hearing-aids and so on. Some of the important issues in this research area are, identifying acoustic event specific features for acoustic event characterization and recognition, optimization of the existing algorithms, developing robust mechanisms for acoustic event recognition in noisy environments, making the-state-of-the-art methods working on big data, developing a joint model that recognizes both acoustic events followed by corresponding scenes etc. Some of the existing approaches towards solutions have major limitations of using known traditional speech features, that are sensitive to noise, use of features from two-dimensional Time-Frequency Representations (TFRs) for recognizing the acoustic events, that demand high computational time;use of deep learning models, that require substantially huge amount of training data. Many novel approaches have been presented in this thesis for recognition of monophonic acoustic events, polyphonic acoustic events and scenes. Two main challenges associated with the real-time Acoustic Event Classification (AEC) are addressed in this thesis. The first one is the effective recognition of acoustic events in noisy environments, and the second one is the use of MapReduce programming model on Hadoop distributed environment to reduce computational complexity. In this thesis, the features are extracted from the spectrograms, which are robust compared to the traditional speech features. Further, an improved Convolutional Recurrent Neural Network (CRNN) and a Deep Neural Network-Driven feature learning models are proposed for Polyphonic Acoustic Event Detection (AED) in real-life recordings. Finally, binaural features are explored to train Kervolutional Recurrent Neural Network (KRNN), which recognizes both acoustic events and a respective scene of an audio signal. Detailed experimental evaluation is carried out to compare the performance of each of the proposed approaches against baseline and state-of-the-art systems.Item Content-based Music Information Retrieval (CB-MIR) and its Applications Towards Music Recommender System(National Institute of Technology Karnataka, Surathkal, 2019) Murthy, Y V Srinivasa.; Koolagudi, Shashidhar G.Music is a pervasive element of human’s day-to-day activities. Most of the people love to listen to music all the time for handling their stress and tensions. Some are capable of creating the music. The importance of music for human beings has exploited the advancements in technology resulting in an enormous number of digital tracks. However, a majority of tracks are available with an inadequate meta-information. The meta-information is limited to the song title, album name, singer name and composer. Now, the question is how to organize them effectively in order to retrieve the relevant clips quickly, without proper meta-information like genre, lyrics, raga, mood, instrument names, etc. The process of labelling the meta-information manually for millions of tracks of the digital cloud is practically not possible. Hence, an area of research known as music information retrieval (MIR) has been introduced in the early years of 21st century. However, it acquired much attention of researchers since 2005 with the support of Music Information Retrieval Evaluation eXchange (MIREX)1 competition. There are several works that have been proposed for various tasks of MIR such as singing voice detection, singer identification, genre classification, instrument identification, music mood estimation, lyrics generation, music annotation and so on. However, the main focus is on Western music, and only a few works are reported on Indian songs in the literature. Since Indian popular songs are contributing to a major portion of the global digital cloud, in this thesis, an attempt has been made to develop a few useful MIR tasks such as vocal and non-vocal segmentation, singer identification, music mood estimation and development of music recommender system in Indian scenario. Efforts have been put to construct relevant databases with a possible coherence for all the tasks mentioned above. Results include comparative analysis with standard datasets such as MIR-1K and artist20 are given. For each of the four tasks, some novel approach has been presented in this thesis. First, the task of vocal and non-vocal segmentation has been choosen to locate the onset and offset points of singing voice regions. A set of novel features such as formant attack slope (FAS), formant heights from base-to-peak (FH1), formant angle values at peak (FA1),formant angle values of valley (FA2), and singer formant (F5) have been computed and used for discriminating vocal and non-vocal segments. Also, an attempt has been made to develop a feature selection algorithm based on the concepts of genetics, known as genetic algorithm based feature selection (GAFS). The list of observations made out of this experimentation using selected features on the Indian and Western databases has been reported. Second, the task of singer identification (SID) has been considered. A database with the songs of 10 male and 10 female singers has been constructed. The songs are taken from two popular cine industries of Indian subcontinent named Tollywood (Telugu) and Bollywood (Hindi). Various timbral and temporal features have been computed to analyze their effect on singer identification with different classifiers. However, the feature based systems are found to be less effective, and hence the trending convolutional neural networks (CNNs) have been used with spectrograms of song clips as inputs. Identifying mood of the song has been considered as a third objective for this thesis. Six different moods are identified based on the analysis done on the combination of Russell’s and Thayer’s models (Saari and Eerola, 2014). We have developed, a two-level classification model for music mood detection. In the first stage, songs have been categorized into energetic or non-energetic songs. The actual class label has been predicted in the second stage. The performance of the system is found to be better in this case compared to development of single phase classification recommender system has been taken up using the labels like the title of a track, singer name(s), mood of a song, and duration. The graph structure based recommendation system has been proposed in this work to estimate the similarity in the listening patterns of same listeners. A graph has been constructed for every user by considering songs as nodes. Further, the similarities are estimated using the adjacency matrices obtained on listening patterns. This approach could be more appropriate for improving the performance of song recommender systems.Item Control and Data Planes in Software Defined Data Center Networks: A Scalable and Resilient Approach(National Institute of Technology Karnataka, Surathkal, 2019) Hegde, Saumya; Koolagudi, Shashidhar G.; Bhattacharya, SwapanThe single central controller of Software Defined Network (SDN) eases network management, but leads to scalability problems. It is therefore ideal to have a logically centralized but physically distributed set of controllers. As part of this work we developed a novel placement metric called subgraph-survivability and designed an algorithm for controller placement using this metric, such that the control plane is not only scalable but also resilient to failure of the controller itself. The controller collects the network statistics information and also communicates the forwarding rules to the switches. This lead to the Edge-Core SDN architecture, where the edge and core network have their own edge and core controller. For such networks, we have developed a separate edge and core controller placement algorithms using suitable metrics for each. The scalability problem of the data plane is due to the limited switch memory and increased size of SDN forwarding rule. Using source routing to forward packets, not only alleviates this problem but also complements the Edge-Core SDN model. Here, we have proposed a source routing mechanism that is scalable, is fair to both elephant and mice traffic, and is resilient to link failures, thus making the data plane scalable and resilient. The algorithm and routing mechanism are validated, through both analytical and empirical methods. The performance metrics of Average Inverse Shortest Path Length (AISPL) and Network Disconnectedness (ND) are used to evaluate our placement algorithms. An improvement of 55.88% for the AISPL metric and 49.22% for ND metric, was observed with our proposed algorithm as compared to the random controller placement. With our source routing mechanism we observe a reduction, in the number of flow table entries and the flow set up time, that is proportional to the number of hops along the path of the packet.Item Emotion Recognition Using Speech Features(2013) Krothapalli, Sreenivasa Rao; Koolagudi, Shashidhar G.Item Phonology Analysis From Childrens' Speech(National Institute of Technology Karnataka, Surathkal, 2022) Bhaskar, Ramteke Pravin; Koolagudi, Shashidhar G.Human vocal tract can produce various sounds. The speech sounds are relatively a very small set of such sounds that appears uniquely quali ed to be used in the production of speech. It includes positions of the parts of the body necessary for producing spoken words and the e ect of air rushing from lungs as it passes through the larynx, pharynx, vocal cords, nasal passages and mouth. Phonetic sounds (phones) are the actual speech sounds classi ed by the manner and place of articulation (i.e. the way in which air is forced through the mouth and shaped by the tongue, teeth, palate, lips and in some languages by the uvula). Children begin language acquisition with their rst meaningful word. Further, they acquire language by mimicking the adult pronunciation. This development mainly depends on the development of vocal tract, neuro-motor control and in uence from the language of people surrounding them. Signi cant di erence can be observed in the vocal tract of the child and adult where the vocal tract in children is underdeveloped and short in comparison with the adult vocal tract. Along with these, other oral cavity parameters such as tongue, larynx, epiglottis, vocal cords are also underdeveloped. Due to this, children face di culty in producing speech sounds, where the pronunciations are simpli ed by substituting the di cult speech sounds with other simple one. This results in signi cant deviations and replacements in the pronunciation of phonemes in children leading to mispronunciation or pronunciation errors. These processes are referred to as phonological processes. The phonological processes appear in the children represents the agewise speech learning ability. The analysis helps the Speech Language Pathologists (SLPs) in studying language learning ability of the children. The manual process of phonology analysis involves lot of human e ort and time. Literature reports that the phonological processes are properly studied in the children speaking English as native language. Indian languages are syllabic in nature and di er from English which is phonemic in nature. Hence, the observations made in the case of English children may not be directly applicable to the study of phonological developments observed in the case of Indian children. In general, the appearance of phonological processes in the case of Indian children is not well studied i and documented. The appearance of these processes beyond certain age may indicate the presence of the phonological disorder. It helps the SLPs to automatically identify the processes and analyse the language learning pattern along with disorders present if processes are observed beyond certain age. In this work, we aim to develop the systems for automatic identi cation of phonological processes in Kannada language. Applications of this research work include evaluation of language learning ability, identi cation of speech and motor disorder, gender based analysis of phonological processes, etc. Some of the important issues in this research area are, large number of non-standardized phonological processes; lack of detailed studies in Indian languages; availability of children's speech databases in the required age range from 31 2 to 61 2 years; di culties in adapting existing systems of mispronunciation identi cation due to huge di erence in the speech production parameters of the adults and children for the proposed age range; need of identifying features characterizing each phonological process in comparison based algorithms. We recorded Kannada language speech dataset from children between age 3 1 2 to 61 2 years and named it as NITK Kids' Speech Corpus. It is collected in three age groups with an interval of one year in each age group. For each age range, the data is recorded from 40 children (20 male and 20 female). This work provides, the detailed analysis of the phonological processes that appear in children from age 3 1 2 years to 6 1 2 years speaking Kannada as native language. Based on the pattern of disappearance of the phonological process, the age-wise analysis of the acquisition of phonemes is provided. A detailed comparison of language learning ability of the children speaking English language and Kannada language is also performed. Based on the e ectiveness of the comparison based algorithms in identi cation of phonological processes in smaller age range, it is considered for the analysis. Commonly observed phonological processes that are considered for our study are: aspiration, nasal- ization & nasal assimilation, palatal fricative fronting, nal consonant deletion, voicing assimilation and vowel deviations. Spectral, prosodic and excitation source features ef- cient in discriminating the correct pronunciation of a phoneme and its mispronounced counterpart are identi ed and exploited for the identi cation of phonological processes. Two case studies are considered for the evaluation. Based on the availability of the dataset for phonological disorder, 'rhotacism' is considered for the analysis. The spec- tral and prosodic features e cient in characterization of the phonological disorder are explored. During the processes of phonological process identi cation, we came across ii interesting problem of children gender identi cation. The task of gender identi cation from children's speech is di cult compared to adult gender identi cation. The gender identi cation from adult speech is also performed to analyze the di culties in the task of children gender identi cation in comparison with the adult speech. The role of spec- tral, prosodic, excitation source features have been proposed gender identi cation in both implementations using suitable machine learning algorithms. Detailed experimental eval- uation is carried out to compare the performance of each of the proposed approaches against baseline and state-of-the-art systems.Item Robust Emotion Recognition Using Spectral and Prosodic Fearures(2013) Krothapalli, Sreenivasa Rao; Koolagudi, Shashidhar G.Item Speech Processing Approaches towards Characterization and Identification of Dialects(National Institute of Technology Karnataka, Surathkal, 2020) Chittaragi, Nagaratna B.; Koolagudi, Shashidhar G.Dialects constitute the phonological, lexical, and grammatical variations in the usage of a language with very minor and subtle differences. These variations are mainly due to specific speaking patterns followed among the group of speakers. In the recent past, dialect identification from the speech is emerging as one of the prominent speech research areas. This is mainly due to the extensive increase in the use of interactive voice-based systems. Therefore, it is essential to address speech variabilities caused due to dialectal differences in order to achieve effective, realistic man-machine interaction. The existing research on characterization and identification of dialects has mainly focused on acoustic, phonetic and phonotactic approaches on several languages such as English, Chinese, Arabic, Hindi, Spanish, etc. However, these models are not proved to be language independent. Applying these models to other languages may not perform equally well as there are many fundamental differences between dialects of different languages. However, in the literature dialect processing models reported with respect to Indian regional languages are considerably less. In this thesis, an attempt is made to develop few useful language independent and dependent Automatic Dialect Identification (ADI) systems for the Kannada language. In the beginning, a new text-independent Kannada Dialect Speech Corpus (KDSC) is collected from native speakers belonging to five prominent dialectal regions of Karnataka. This thesis investigates the significances of the excitation source, spectral, and prosodic features of speech for dialect identification. Additionally, spectrotemporal variations across dialects are captured through 2D Gabor features which are known to be biologically inspired ones. Further, the existence of non- conventional dialect-specific rhythmic and melodic correlations among dialects are explored using chroma features. These are well-established features in music-related applications. Robustness of these proposed features has been investigated under noisy background conditions and with small sized (limited data) audio clips. Inaddition, word and sentence based ADI systems are proposed using intonation and intensity variations representing the dynamic and static prosodic behaviors. Further, language dependent dialect identification systems are proposed for Kannada language using basic phonetic unit level dialect information. Additionally, Kannada language specific ’case’ (Vibhakthi Prathyayas) based dialect identification approaches are proposed. A single classifier based Support Vector Machines (SVM) and multiple classifiers based ensemble algorithms are used for classification of dialects. Experiments are carried out using individual and combinations of features. Use of different features has illustrated their complementary nature towards dialect processing. Performance comparison of both categories of classification algorithms has shown that ensemble algorithms perform better over single classifier based algorithms. Further, the intuition to use rhythm based aspects of dialects through chroma and spectral-shape features has shown better performance over state-of-the-art i-vector features. Moreover, this feature set has shown the noise robustness over the conventional MFCCs. In this work, we also have proposed intonation and intensity features to capture dialectal information from words and sentences for effective classification of dialects. In continuation, the role of duration, energy, pitch, three formants, and spectral features is also found to be evidential in Kannada dialect classification.
