Enhancing Speech De-Identification with LLM-Based Data Augmentation

dc.contributor.author	Dhingra, P.
dc.contributor.author	Agrawal, S.
dc.contributor.author	Veerappan, C.S.
dc.contributor.author	Chng, E.S.
dc.contributor.author	Tong, R.
dc.date.accessioned	2026-02-06T06:33:41Z
dc.date.issued	2024
dc.description.abstract	This paper addresses the challenge of data scarcity in speech de-identification by introducing a novel, fully automated data augmentation method leveraging large language models. Our approach overcomes the limitations of human annotation, enabling the creation of extensive training datasets. To enhance de-identification performance, we compare pipeline and end-to-end models. While the pipeline approach sequentially applies speech recognition and named entity recognition, the end-to-end model jointly learns these tasks. Experimental results demonstrate the effectiveness of our data augmentation strategy and the superiority of the end-to-end model in improving PII detection accuracy and robustness. Â© 2024 IEEE.
dc.identifier.citation	2024 11th International Conference on Advanced Informatics: Concept, Theory and Application, ICAICTA 2024, 2024, Vol., , p. -
dc.identifier.uri	https://doi.org/10.1109/ICAICTA63815.2024.10762997
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/28809
dc.publisher	Institute of Electrical and Electronics Engineers Inc.
dc.subject	Data augmentation
dc.subject	de-identification
dc.subject	named entity recognition
dc.subject	speech recognition
dc.title	Enhancing Speech De-Identification with LLM-Based Data Augmentation

Collections