Browsing by Author "Dhingra, P."

Now showing 1 - 2 of 2

Enhancing Speech De-Identification with LLM-Based Data Augmentation
(Institute of Electrical and Electronics Engineers Inc., 2024) Dhingra, P.; Agrawal, S.; Veerappan, C.S.; Chng, E.S.; Tong, R.
This paper addresses the challenge of data scarcity in speech de-identification by introducing a novel, fully automated data augmentation method leveraging large language models. Our approach overcomes the limitations of human annotation, enabling the creation of extensive training datasets. To enhance de-identification performance, we compare pipeline and end-to-end models. While the pipeline approach sequentially applies speech recognition and named entity recognition, the end-to-end model jointly learns these tasks. Experimental results demonstrate the effectiveness of our data augmentation strategy and the superiority of the end-to-end model in improving PII detection accuracy and robustness. Â© 2024 IEEE.
Speech de-identification data augmentation leveraging large language model
(Institute of Electrical and Electronics Engineers Inc., 2024) Dhingra, P.; Agrawal, S.; Veerappan, C.S.; Ho, T.N.; Chng, E.S.; Tong, R.
This work addresses the challenge of limited real-world speech data in speech de-identification, the process of removing Personally Identifiable Information (PII). We formulate speech de-identification as a named entity recognition (NER) task specifically for spoken English. To overcome data scarcity and enhance NER performance, we propose a data augmentation approach. This approach leverages a large language model to generate synthetic speech style text data enriched with diverse PII entities. The generated data undergoes an iterative process using a customized NER model for semi-automatic PII annotation. Our analysis demonstrates the effectiveness of this data augmentation strategy in significantly improving NER performance on spoken language text. Furthermore, to gain deeper insights into the specific errors made during NER, we employ performance analysis using alternative evaluation metrics. Â© 2024 IEEE.