Text Augmentation for Enhancing the Text Classification for Low Resource Language

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Springer

Abstract

The technique of producing more data from a small corpus to improve the predic- tion models’ performance is text augmentation. This Work Focuses on the pivotal role of text augmentation in Natural Language Processing (NLP). It tackles two significant challenges within the field: first, the adaptation of augmentation techniques for low-resource languages, where labeled data is scarce, and second, the enhancement of text classification across diverse domains, including senti- ment analysis, topic classification, and spam detection. This research leverages state-of-the-art transformer-based models like BERT and GPT-2 to ensure the adaptability and effectiveness of these augmentation techniques. The goal is to make NLP more accessible and impactful for low-resource languages, overcoming the challenges of data scarcity. Accuracy and applicability of text classification models, catering to a wide range of applications. Using two Swedish datasets as a paradigm for low-resource languages, we demonstrate the effectiveness of our techniques through thorough empirical testing, as measured by F1 scores. Our findings highlight how enhanced data improves classification performance in sit- uations with limited resources. By exploring various augmentation methods and their applications, this research contributes to advancing NLP solutions for both language-specific and classification-related challenges, pushing the boundaries of text augmentation’s capabilities in the field of NLP. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.

Description

Keywords

Back translation, BERT, GPT, NLP, T5, Text augmentation, Text classification

Citation

SN Computer Science, 2025, 6, 6, pp. -

Collections

Endorsement

Review

Supplemented By

Referenced By