Enhancing Speech De-Identification with LLM-Based Data Augmentation

No Thumbnail Available

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers Inc.

Abstract

This paper addresses the challenge of data scarcity in speech de-identification by introducing a novel, fully automated data augmentation method leveraging large language models. Our approach overcomes the limitations of human annotation, enabling the creation of extensive training datasets. To enhance de-identification performance, we compare pipeline and end-to-end models. While the pipeline approach sequentially applies speech recognition and named entity recognition, the end-to-end model jointly learns these tasks. Experimental results demonstrate the effectiveness of our data augmentation strategy and the superiority of the end-to-end model in improving PII detection accuracy and robustness. © 2024 IEEE.

Description

Keywords

Data augmentation, de-identification, named entity recognition, speech recognition

Citation

2024 11th International Conference on Advanced Informatics: Concept, Theory and Application, ICAICTA 2024, 2024, Vol., , p. -

Endorsement

Review

Supplemented By

Referenced By