Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 8 of 8

Cochlear Acoustic Model thatÂ Improves theÂ Speech Perception inÂ Noise byÂ Encoding TFS
(Springer Science and Business Media Deutschland GmbH, 2022) Poluboina, V.; Pulikala, A.; Pitchaimuthu, A.
People with cochlear implants accomplish good speech recognition scores in quiet. Temporal envelope (ENV) is encoded primarily in cochlear implant (CI), and it is sufficient for recognizing speech in quiet. However, temporal fine structures (TFS) are needed for better recognition of speech in noise. Some fine structure coding strategies tried to modulate temporal envelope with TFS. In such coding strategies, FS4 is one that tried to encode fine structures upÂ 950Â Hz. In this study, the performance of FS4 with speech recognition in noise was investigated by using acoustic simulation. The speech intelligibility of this study was conducted on five normal-hearing (NH) persons. This performance was compared with 16 channel sinewave vocoder and with the Full band TFS condition. The variance of these three conditions was analyzed using the SNR 50. These results indicate that the fine structure (FS4) coding (upÂ to 1078Â Hz Hz) has improved speech recognition in noise compared to the sinewave vocoder. Â© 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
Hardware-Optimized Deep Learning Model for FPGA-Based Character Recognition
(Institute of Electrical and Electronics Engineers Inc., 2023) Rao, P.S.; Pulikala, A.
Deep neural networks (DNNs) are widely used algorithms in machine learning. Even though most of the deep learning applications are driven by software solutions, there has been significant research and development aimed at optimizing these algorithms over the years. However, when considering hardware implementation applications, it becomes essential to optimize the design not only in software but also in hardware. In this paper, we present a straightforward yet effective Convolutional Neural Network architecture that is meticulously optimized both in hardware and software for char-acter recognition applications. The implemented accelerator was realized on a Xilinx Zynq XC7Z020CLG484 FPGA using a high-level synthesis tool. To enhance performance, the accelerator employs an optimized fixed-point data type and applies loop parallelization techniques combining 2D convolution and 2D max pooling operations. The hardware efficiency of the proposed DNN is compared with some of the existing architectures in terms of hardware utilization. Â© 2023 IEEE.
Robustness Analysis of EV Charging System using Random Forest Algorithm
(Institute of Electrical and Electronics Engineers Inc., 2023) Barre, U.P.V.; Satyanarayan, S.; Reddy, H.; Pulikala, A.; Bajaj, A.
Electric cars offer numerous benefits and are considered the future of the automobile industry. However, their worldwide adoption still needs to grow. One of the primary reasons for this delay in electrification is charge anxiety, which refers to the uncertainty customers feel when connecting the charging cable to the car. To address this issue, this study analyses the performance of the charging system using a machine learning model to identify sensitive signals that influence the charging process and can cause successful charging or charge termination. The analysis will also help to define robust operating regions where the charging component can reliably function, regardless of external conditions. This study's findings will provide insights into electric vehicle charging behavior with the supply station. Â© 2023 IEEE.
Speech Intelligibility Enhancement for Cochlear Implant using Multi-Objective Deep Denoising Autoencoder
(Institute of Electrical and Electronics Engineers Inc., 2023) Vishnu, B.U.P.; Poluboina, V.; Sushma, B.; Pulikala, A.
This study introduces a novel technique for enhancing the performance of deep denoising autoencoders (DDAE) in speech processing for cochlear implants (CIs). For individuals with hearing loss, cochlear implants are electronic devices that help to restore their ability to hear. However, the performance of CIs speech intelligibility in the noisy environment is limited. One of the most commonly used methods for reducing noise in CIs is through a preprocessing technique called deep denoising autoencoder. DDAE models have shown potential in learning various noise patterns, but their performance in enhancing speech intelligibility is relatively low due to a ineffective objective function. To address this limitation, this study proposes a multi-objective technique to fine-tune the DDAE model. When multiple objectives are optimized simultaneously, the model becomes more robust and better at handling real-time noise. Based on the experimental findings, it has been confirmed that the proposed multi-objective learning technique performs better than other models when it comes to speech intelligibility. Furthermore, the enhanced signal is presented to the acoustic cochlear implant simulator to evaluate the improvement of speech intelligibility in CIs. Â© 2023 IEEE.
Contribution of frequency compressed temporal fine structure cues to the speech recognition in noise: An implication in cochlear implant signal processing
(Elsevier Ltd, 2022) Poluboina, V.; Pulikala, A.; Pitchai Muthu, A.N.
The study investigated the effect of proportionally frequency compressed encoding of temporal fine structure information on speech perception in noise using vocoder simulations of cochlear implant signal processing. The study proposed a pitch synchronous overlap-add algorithm (PSOLA) for downward frequency shifting of TFS. The speech recognition scores (SRS) were measured at −10 dB, 0 dB, and +10 dB for eight signal processing conditions corresponding to sinewave vocoder without TFS (NO-TFS), four unshifted TFS conditions including full band TFS, TFS up to 2000, 1000, and 600 Hz, and three conditions with PSOLA which shifted 2000, 1000 and 600 Hz TFS to 1000, 500 and 300 Hz respectively. The original envelope was unchanged across the conditions. SRS at +10 dB and −10 dB SNR reached ceiling and floor respectively, in most conditions. Hence, SRS at 0 dB SNR was compared across the conditions. The results showed that the SRS was highest with full band TFS and lowest for the NO-TFS condition.The SRS for TFS 600 Hz shifted to 300 Hz through PSOLA was higher than the NO-TFS condition. Study findings suggest that encoding TFS by proportional frequency compression results in better speech perception in noise compared to NO-TFS. An important observation of this current study is that the speech recognition was better than the sine wave vocoder for all TFS conditions including frequency compressed 600 Hz TFS. © 2021 Elsevier Ltd
An Improved Noise Reduction Technique for Enhancing the Intelligibility of Sinewave Vocoded Speech: Implication in Cochlear Implants
(Institute of Electrical and Electronics Engineers Inc., 2023) Poluboina, V.; Pulikala, A.; Pitchaimuthu, A.N.P.
A cochlear implant (CI) is the most suitable option for individuals with severe profound hearing loss. CI restores the audibility to near perfection and offers good speech understanding in quiet. However, the speech perception in noise with CIs is less optimal as most speech coding strategies of CIs encode only the temporal envelope. Besides the current CI signal coding strategies lacks sophisticated pre-processing. In the current study, we proposed a novel pre-processing method to improve speech Intelligibility in noise and tested using the acoustic simulations of cochlear implants. The proposed noise reduction technique aims to minimize the mean square error (MSE) between the temporal envelopes of the enhanced speech and its clean speech. Therefore, the proposed method will be suitable for CI applications. This paper provides an analysis of the theoretical derivation of the noise suppression function and also the performance evaluation using objective and subjective tests. The effectiveness of the proposed method was objectively evaluated using the SRMR-CI and ESTOI. Additionally, speech recognition through the acoustic simulations of the cochlear implant was done for the subjective evaluation. Performance of the proposed method was compared with the Weiner filter (WF) and sigmoidal functions. The sinewave vocoder was used to simulate the cochlear implant perception. Both objective and subjective scores revealed that the performance of the proposed technique is superior to the WF and sigmoidal function. © 2013 IEEE.
AAPFC-BUSnet: Hierarchical encoder–decoder based CNN with attention aggregation pyramid feature clustering for breast ultrasound image lesion segmentation
(Elsevier Ltd, 2024) Sushma, B.; Pulikala, A.
Breast cancer causes a serious menace to women's health and lives, underscoring the urgency of accurate tumor detection. Detecting both cancerous and non-cancerous breast tumors has become increasingly crucial, with ultrasound imaging emerging as a widely adopted modality for this purpose. However, identifying breast lesions in ultrasound images is a challenging task due to various tumor morphologies, geometry, similar color intensity distributions, and fuzzy boundaries, particularly irregularly shaped malignant tumors. This work proposes an encoder–decoder based U-shaped convolutional neural network (CNN) variant with an attention aggregation-based pyramid feature clustering module (AAPFC) to detect breast lesion regions. The network consists of the U-Net variant as a base network and AAPFC to fuse features extracted at the various levels of the base U-Net using a suitable feature fusion technique. Furthermore, the deformable convolution with adaptive self-attention mechanism is introduced to decode the pyramid features parallel to capture the various geometric features at multi-stages. Two public breast lesion ultrasound datasets consisting 263 malignant, 547 benign and 133 normal images are considered to evaluate the performance of the proposed model and state-of-the-art deep CNN-based segmentation models. The proposed model provides 96% accuracy, 68% Mean-IoU, 97% specificity, 82% sensitivity and 0.747 kappa score respectively. The conducted qualitative and quantitative performance analysis experiments show that the proposed model performs better in breast lesion segmentation on ultrasound images. © 2024 Elsevier Ltd
Deep Speech Denoising with Minimal Dependence on Clean Speech Data
(Birkhauser, 2024) Poluboina, V.; Pulikala, A.; Pitchaimuthu, A.N.
Most of the existing deep learning-based speech denoising methods rely heavily on clean speech data. According to the traditional view, a large number of noisy and clean speech samples are required for good speech denoising performance. However, the data collection is a technical barrier to this criteria, particularly in economically challenged areas and for languages with limited resources. Training deep denoising networks with only noisy speech samples is a viable option to avoid dependence on sample data size. In this study, the target and input of a DCU-Net were trained using only noisy speech samples. Experimental results demonstrate that, when compared to traditional speech denoising techniques, the proposed approach avoids not only the high dependence on clean targets but also the high dependence on large data sizes. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results