Enhancing Speech Perception In Cochlear Implants: Novel Approaches In Encoding Temporal Fine Structures and Noise Reduction
Date
2023
Authors
Venkateswarlu, Poluboina
Journal Title
Journal ISSN
Volume Title
Publisher
National Institute Of Technology Karnataka Surathkal
Abstract
Cochlear implants (CIs) significantly enhance audibility and speech intel-
ligibility in quiet environments. Nevertheless, speech recognition in noisy
conditions remains a notable challenge. Efforts to enhance speech percep-
tion in cochlear implants typically follow two approaches: preprocessing,
which involves improving the signal-to-noise ratio (SNR), and speech cod-
ing, aimed at encoding the significant cues necessary for speech recogni-
tion in noisy environments. The current thesis addresses both approaches.
The initial approach involves encoding vital cues meaningfully, focusing
on examining the impact of temporal fine structures through proportional
frequency compression. In the second part, two denoising techniques are
proposed as pre-processing to improve the SNR; one is the modified Wiener
filter method, and the other one is the Deep denoising method for speech
enhancement.
The research investigates the significance of TFS cut-off frequencies in CI
speech coding to enhance speech perception in noise. Based on observa-
tions, an algorithm is introduced to represent TFS through proportionally
frequency compressed cues. Additionally, a pitch-shifted overlap-add algo-
rithm (PSOLA) is proposed to encode TFS within the neuro-physiological
limitations of CI users. Speech recognition scores (SRS) are measured
under various signal processing conditions, including a sinewave vocoder
without TFS, four unshifted TFS conditions with varying frequency cut-
offs, and three PSOLA conditions that shift TFS frequencies. The original
envelope remains unchanged across all conditions. The results indicate
that the SRS for TFS 600 Hz shifted to 300 Hz through PSOLA outper-
forms the no-TFS condition (sinewave vocoder), suggesting that encoding
TFS using proportional frequency compression leads to improved speech
perception in noise compared to the absence of TFS.
Furthermore, a modified Wiener filter method is proposed to enhance
speech intelligibility specifically for noisy environments, focusing on the
context of cochlear implants. This noise reduction technique aims to min-
imize the mean square error (MSE) between the temporal envelopes of the
enhanced speech and the clean speech, making it suitable for CI appli-
vcations. The study provides a theoretical analysis of the noise suppres-
sion function and evaluates its performance using objective and subjective
tests. Objective measures such as the speech-to-reverberation modulation
energy ratio (SRMR-CI) and extended short-time objective intelligibility
(ESTOI) are employed, while subjective evaluation involves speech recog-
nition through acoustic simulations of the cochlear implant. The proposed
method’s performance is compared with the Weiner filter (WF) and sig-
moidal functions, using the sinewave vocoder to simulate cochlear implant
perception.
Finally, a new method is proposed for speech enhancement with deep
learning training. The mathematical derivation supports the effectiveness
of the proposed Noisy2Noisyavg (N2Navg ) strategy over the Noise2Noise
(N2N) strategy. The target and the input of a deep complex unit- network
(DCU-Net) are trained solely using noisy speech samples, eliminating the
need for a large number of clean speech samples. The proposed method is
compared with state-of-the-art speech-denoising techniques. Experimen-
tal results demonstrate that the proposed approach not only reduces the
reliance on clean targets but also mitigates the dependency on large data
sizes typically associated with speech-denoising techniques.
In summary, this research addresses the limitations of current cochlear
implant algorithms by proposing novel approaches for TFS encoding, noise
reduction, and deep learning-based speech enhancement. The findings
contribute to improving speech perception and intelligibility for individuals
with cochlear implants, providing insights for further advancements in the
field.
Description
Keywords
Cochlear implants, Pitch shifting, Speech enhancement, Speech recognition