Enhancing Speech De-Identification with LLM-Based Data Augmentation
| dc.contributor.author | Dhingra, P. | |
| dc.contributor.author | Agrawal, S. | |
| dc.contributor.author | Veerappan, C.S. | |
| dc.contributor.author | Chng, E.S. | |
| dc.contributor.author | Tong, R. | |
| dc.date.accessioned | 2026-02-06T06:33:41Z | |
| dc.date.issued | 2024 | |
| dc.description.abstract | This paper addresses the challenge of data scarcity in speech de-identification by introducing a novel, fully automated data augmentation method leveraging large language models. Our approach overcomes the limitations of human annotation, enabling the creation of extensive training datasets. To enhance de-identification performance, we compare pipeline and end-to-end models. While the pipeline approach sequentially applies speech recognition and named entity recognition, the end-to-end model jointly learns these tasks. Experimental results demonstrate the effectiveness of our data augmentation strategy and the superiority of the end-to-end model in improving PII detection accuracy and robustness. © 2024 IEEE. | |
| dc.identifier.citation | 2024 11th International Conference on Advanced Informatics: Concept, Theory and Application, ICAICTA 2024, 2024, Vol., , p. - | |
| dc.identifier.uri | https://doi.org/10.1109/ICAICTA63815.2024.10762997 | |
| dc.identifier.uri | https://idr.nitk.ac.in/handle/123456789/28809 | |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | |
| dc.subject | Data augmentation | |
| dc.subject | de-identification | |
| dc.subject | named entity recognition | |
| dc.subject | speech recognition | |
| dc.title | Enhancing Speech De-Identification with LLM-Based Data Augmentation |
