Transfer learning based code-mixed part-of-speech tagging using character level representations for Indian languages

Anand Kumar, A.K.; Padannayil, S.K.

Transfer learning based code-mixed part-of-speech tagging using character level representations for Indian languages

dc.contributor.author	Anand Kumar, A.K.
dc.contributor.author	Padannayil, S.K.
dc.date.accessioned	2026-02-04T12:26:36Z
dc.date.issued	2023
dc.description.abstract	Massive amounts of unstructured content have been generated day-by-day on social media platforms like Facebook, Twitter and blogs. Analyzing and extracting useful information from this vast amount of text content is a challenging process. Social media have currently provided extensive opportunities for researchers and practitioners to do adequate research on this area. Most of the text content in social media tend to be either in English or code-mixed regional languages. In a multilingual country like India, code-mixing is the usual fashion witnessed in social media discussions. Multilingual users frequently use Roman script, an convenient mode of expression, instead of the regional language script for posting messages on social media and often mix it with English into their native languages. Stylistic and grammatical irregularities are significant challenges in processing the code-mixed text using conventional methods. This paper explains the new word embedding via character level representation as features for POS tagging the code-mixed text in Indian languages using the ICON-2015, ICON-2016 NLP tools contest data set. The proposed word embedding features are context-appended, and the well-known Support Vector Machine (SVM) classifier has been used to train the system. We have combined the Facebook, Twitter, and WhatsApp code-mixed data of three Indian languages to train the Transfer learning based language-independent and source independent POS tagging. The experimental results demonstrated that the proposed transfer method achieved state-of-the-art accuracy in 12 systems out of 18 systems for the ICON data set. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
dc.identifier.citation	Journal of Ambient Intelligence and Humanized Computing, 2023, 14, 6, pp. 7207-7218
dc.identifier.issn	18685137
dc.identifier.uri	https://doi.org/10.1007/s12652-021-03573-3
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/21893
dc.publisher	Springer Science and Business Media Deutschland GmbH
dc.subject	Codes (symbols)
dc.subject	Computational linguistics
dc.subject	Embeddings
dc.subject	Natural language processing systems
dc.subject	Social networking (online)
dc.subject	Syntactics
dc.subject	Character and word embedding
dc.subject	Code-mixed script
dc.subject	ICON-2015
dc.subject	ICON-2016
dc.subject	Indian languages
dc.subject	Language processing
dc.subject	Natural language processing
dc.subject	Natural languages
dc.subject	Part of speech tagging
dc.subject	Parts-of-speech tagging
dc.subject	Social media
dc.subject	Support vectors machine
dc.subject	Transfer learning
dc.subject	Support vector machines
dc.title	Transfer learning based code-mixed part-of-speech tagging using character level representations for Indian languages

Collections

Journal Articles

Transfer learning based code-mixed part-of-speech tagging using character level representations for Indian languages

Files

Collections