Text Augmentation for Enhancing the Text Classification for Low Resource Language
No Thumbnail Available
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
Springer
Abstract
The technique of producing more data from a small corpus to improve the predic- tion models’ performance is text augmentation. This Work Focuses on the pivotal role of text augmentation in Natural Language Processing (NLP). It tackles two significant challenges within the field: first, the adaptation of augmentation techniques for low-resource languages, where labeled data is scarce, and second, the enhancement of text classification across diverse domains, including senti- ment analysis, topic classification, and spam detection. This research leverages state-of-the-art transformer-based models like BERT and GPT-2 to ensure the adaptability and effectiveness of these augmentation techniques. The goal is to make NLP more accessible and impactful for low-resource languages, overcoming the challenges of data scarcity. Accuracy and applicability of text classification models, catering to a wide range of applications. Using two Swedish datasets as a paradigm for low-resource languages, we demonstrate the effectiveness of our techniques through thorough empirical testing, as measured by F1 scores. Our findings highlight how enhanced data improves classification performance in sit- uations with limited resources. By exploring various augmentation methods and their applications, this research contributes to advancing NLP solutions for both language-specific and classification-related challenges, pushing the boundaries of text augmentation’s capabilities in the field of NLP. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.
Description
Keywords
Back translation, BERT, GPT, NLP, T5, Text augmentation, Text classification
Citation
SN Computer Science, 2025, 6, 6, pp. -
