Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
3 results
Search Results
Item Development of Voice Activated Ground Control Station(Elsevier B.V., 2016) Rahul, D.K.; Veena, S.; Lokesha, H.; Vinay, S.; Kumar, B.P.; Ananda, C.M.; Durdi, V.B.This paper chronicles the development of Automatic Speech Recognition (ASR) system that can be integrated to Ground Control Station (GCS) of MAVs to achieve voice activation. The first part of the paper highlights the nature of aerospace speech signals and hence the issues to be considered while designing a voice activated aerospace application. The second part describes the development and integration of an ASR capability to the GCS. © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license.Item Monophone and Triphone Acoustic Phonetic Model for Kannada Speech Recognition System(Institute of Electrical and Electronics Engineers Inc., 2022) Kumar, T.N.M.; Jayan, A.; Bhat, S.; Anvith, M.; Narasimhadhan, A.V.The automatic Speech Recognition system (ASR) is the most widely used application in the speech domain. ASR systems generate text data from spoken utterances without manual intervention. In this work, we build an ASR system for the Kannada language. For building the proposed system, we extract Mel Frequency Cepstral Coefficients (MFCC) features from the audio data, and the Kannada language model is developed using corresponding labels. The dictionary generation and phonetic labelings are automated. Recognition performance is compared for both monophonic and triphone models. The word error rate of 15.73 % and the sentence error rate of 55.5 % are achieved for the triphone model. Comparatively, the triphone model gives a better performance than the monophonic model. © 2022 IEEE.Item An Improved Transformer Transducer Architecture for Hindi-English Code Switched Speech Recognition(International Speech Communication Association, 2022) Antony, A.; Kota, S.R.; Lade, A.; Spoorthy, V.; Koolagudi, S.G.Due to the extensive usage of technology in many languages throughout the world, interest in Automatic Speech Recognition (ASR) systems for Code-Switching (CS) in speech has grown in recent years. Several studies have shown that End-to-End (E2E) ASR is easier to adopt and works much better in monolingual settings. E2E systems are likewise widely recognised for requiring massive quantities of labelled speech data. Since there is a scarcity in the availability of large amount of CS speech, E2E ASR takes longer computation time and does not offer promising results. In this work, an E2E ASR model system using a transformer-transducer architecture is introduced for code-switched Hindi-English speech, and also addressed training data scarcity by leveraging the vastly available monolingual data. Specifically, the language-specific modules in the Transformer are pre-trained by leveraging the vastly available single language speech datasets. The proposed method also provides a Word Error Rate (WER) of 29.63% and Transliterated Word Error Rate (T-WER) of 27.42% which is better than the state-of-the-art by 2.19%. © © 2022 ISCA.
