Text Augmentation for Enhancing the Text Classification for Low Resource Language

dc.contributor.authorKumar, K.
dc.contributor.authorRudra, B.
dc.date.accessioned2026-02-03T13:19:35Z
dc.date.issued2025
dc.description.abstractThe technique of producing more data from a small corpus to improve the predic- tion models’ performance is text augmentation. This Work Focuses on the pivotal role of text augmentation in Natural Language Processing (NLP). It tackles two significant challenges within the field: first, the adaptation of augmentation techniques for low-resource languages, where labeled data is scarce, and second, the enhancement of text classification across diverse domains, including senti- ment analysis, topic classification, and spam detection. This research leverages state-of-the-art transformer-based models like BERT and GPT-2 to ensure the adaptability and effectiveness of these augmentation techniques. The goal is to make NLP more accessible and impactful for low-resource languages, overcoming the challenges of data scarcity. Accuracy and applicability of text classification models, catering to a wide range of applications. Using two Swedish datasets as a paradigm for low-resource languages, we demonstrate the effectiveness of our techniques through thorough empirical testing, as measured by F1 scores. Our findings highlight how enhanced data improves classification performance in sit- uations with limited resources. By exploring various augmentation methods and their applications, this research contributes to advancing NLP solutions for both language-specific and classification-related challenges, pushing the boundaries of text augmentation’s capabilities in the field of NLP. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.
dc.identifier.citationSN Computer Science, 2025, 6, 6, pp. -
dc.identifier.issn2662995X
dc.identifier.urihttps://doi.org/10.1007/s42979-025-04120-z
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/20155
dc.publisherSpringer
dc.subjectBack translation
dc.subjectBERT
dc.subjectGPT
dc.subjectNLP
dc.subjectT5
dc.subjectText augmentation
dc.subjectText classification
dc.titleText Augmentation for Enhancing the Text Classification for Low Resource Language

Files

Collections