Conference Papers

Search Results

Now showing 1 - 2 of 2

Sample-based DC prediction strategy for HEVC lossless intra prediction mode
(Institute of Electrical and Electronics Engineers Inc., 2017) Kamath, S.S.; Aparna., P.; Antony, A.
High-Efficiency Video Coding (HEVC), the state-of-the-art video coding standard by the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group, is presently being prepared to handle the next generation multi-media services. Lossless mode of HEVC is designed to support a variety of lossless compression applications like medical imaging, preservation of artwork, video analytics, etc. The accuracy of the intra prediction can be improved through the incorporation of sample-based prediction strategies which replace the block-based prediction within HEVC. In this work, we propose a sample-based DC intra prediction strategy to enhance the compression efficiency of the HEVC lossless mode. The detailed experimental analysis demonstrates that the proposed method outperforms the HEVC lossless mode of HM16.12 in terms of bit-rate savings by 1.43% and 0.46% on an average for AI-Main and AI-Main10 configurations respectively, without any increase in run-time. Â© 2017 IEEE.
An Improved Transformer Transducer Architecture for Hindi-English Code Switched Speech Recognition
(International Speech Communication Association, 2022) Antony, A.; Kota, S.R.; Lade, A.; Spoorthy, V.; Koolagudi, S.G.
Due to the extensive usage of technology in many languages throughout the world, interest in Automatic Speech Recognition (ASR) systems for Code-Switching (CS) in speech has grown in recent years. Several studies have shown that End-to-End (E2E) ASR is easier to adopt and works much better in monolingual settings. E2E systems are likewise widely recognised for requiring massive quantities of labelled speech data. Since there is a scarcity in the availability of large amount of CS speech, E2E ASR takes longer computation time and does not offer promising results. In this work, an E2E ASR model system using a transformer-transducer architecture is introduced for code-switched Hindi-English speech, and also addressed training data scarcity by leveraging the vastly available monolingual data. Specifically, the language-specific modules in the Transformer are pre-trained by leveraging the vastly available single language speech datasets. The proposed method also provides a Word Error Rate (WER) of 29.63% and Transliterated Word Error Rate (T-WER) of 27.42% which is better than the state-of-the-art by 2.19%. Â© Â© 2022 ISCA.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results