Enhancing Speech De-Identification with LLM-Based Data Augmentation
No Thumbnail Available
Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc.
Abstract
This paper addresses the challenge of data scarcity in speech de-identification by introducing a novel, fully automated data augmentation method leveraging large language models. Our approach overcomes the limitations of human annotation, enabling the creation of extensive training datasets. To enhance de-identification performance, we compare pipeline and end-to-end models. While the pipeline approach sequentially applies speech recognition and named entity recognition, the end-to-end model jointly learns these tasks. Experimental results demonstrate the effectiveness of our data augmentation strategy and the superiority of the end-to-end model in improving PII detection accuracy and robustness. © 2024 IEEE.
Description
Keywords
Data augmentation, de-identification, named entity recognition, speech recognition
Citation
2024 11th International Conference on Advanced Informatics: Concept, Theory and Application, ICAICTA 2024, 2024, Vol., , p. -
