Text Augmentation for Enhancing the Text Classification for Low Resource Language

Kumar, K.; Rudra, B.

Text Augmentation for Enhancing the Text Classification for Low Resource Language

dc.contributor.author	Kumar, K.
dc.contributor.author	Rudra, B.
dc.date.accessioned	2026-02-03T13:19:35Z
dc.date.issued	2025
dc.description.abstract	The technique of producing more data from a small corpus to improve the predic- tion models’ performance is text augmentation. This Work Focuses on the pivotal role of text augmentation in Natural Language Processing (NLP). It tackles two significant challenges within the field: first, the adaptation of augmentation techniques for low-resource languages, where labeled data is scarce, and second, the enhancement of text classification across diverse domains, including senti- ment analysis, topic classification, and spam detection. This research leverages state-of-the-art transformer-based models like BERT and GPT-2 to ensure the adaptability and effectiveness of these augmentation techniques. The goal is to make NLP more accessible and impactful for low-resource languages, overcoming the challenges of data scarcity. Accuracy and applicability of text classification models, catering to a wide range of applications. Using two Swedish datasets as a paradigm for low-resource languages, we demonstrate the effectiveness of our techniques through thorough empirical testing, as measured by F1 scores. Our findings highlight how enhanced data improves classification performance in sit- uations with limited resources. By exploring various augmentation methods and their applications, this research contributes to advancing NLP solutions for both language-specific and classification-related challenges, pushing the boundaries of text augmentation’s capabilities in the field of NLP. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.
dc.identifier.citation	SN Computer Science, 2025, 6, 6, pp. -
dc.identifier.issn	2662995X
dc.identifier.uri	https://doi.org/10.1007/s42979-025-04120-z
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/20155
dc.publisher	Springer
dc.subject	Back translation
dc.subject	BERT
dc.subject	GPT
dc.subject	NLP
dc.subject	T5
dc.subject	Text augmentation
dc.subject	Text classification
dc.title	Text Augmentation for Enhancing the Text Classification for Low Resource Language

Collections

Journal Articles

Text Augmentation for Enhancing the Text Classification for Low Resource Language

Files

Collections