Transfer learning based code-mixed part-of-speech tagging using character level representations for Indian languages

dc.contributor.authorAnand Kumar, A.K.
dc.contributor.authorPadannayil, S.K.
dc.date.accessioned2026-02-04T12:26:36Z
dc.date.issued2023
dc.description.abstractMassive amounts of unstructured content have been generated day-by-day on social media platforms like Facebook, Twitter and blogs. Analyzing and extracting useful information from this vast amount of text content is a challenging process. Social media have currently provided extensive opportunities for researchers and practitioners to do adequate research on this area. Most of the text content in social media tend to be either in English or code-mixed regional languages. In a multilingual country like India, code-mixing is the usual fashion witnessed in social media discussions. Multilingual users frequently use Roman script, an convenient mode of expression, instead of the regional language script for posting messages on social media and often mix it with English into their native languages. Stylistic and grammatical irregularities are significant challenges in processing the code-mixed text using conventional methods. This paper explains the new word embedding via character level representation as features for POS tagging the code-mixed text in Indian languages using the ICON-2015, ICON-2016 NLP tools contest data set. The proposed word embedding features are context-appended, and the well-known Support Vector Machine (SVM) classifier has been used to train the system. We have combined the Facebook, Twitter, and WhatsApp code-mixed data of three Indian languages to train the Transfer learning based language-independent and source independent POS tagging. The experimental results demonstrated that the proposed transfer method achieved state-of-the-art accuracy in 12 systems out of 18 systems for the ICON data set. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
dc.identifier.citationJournal of Ambient Intelligence and Humanized Computing, 2023, 14, 6, pp. 7207-7218
dc.identifier.issn18685137
dc.identifier.urihttps://doi.org/10.1007/s12652-021-03573-3
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/21893
dc.publisherSpringer Science and Business Media Deutschland GmbH
dc.subjectCodes (symbols)
dc.subjectComputational linguistics
dc.subjectEmbeddings
dc.subjectNatural language processing systems
dc.subjectSocial networking (online)
dc.subjectSyntactics
dc.subjectCharacter and word embedding
dc.subjectCode-mixed script
dc.subjectICON-2015
dc.subjectICON-2016
dc.subjectIndian languages
dc.subjectLanguage processing
dc.subjectNatural language processing
dc.subjectNatural languages
dc.subjectPart of speech tagging
dc.subjectParts-of-speech tagging
dc.subjectSocial media
dc.subjectSupport vectors machine
dc.subjectTransfer learning
dc.subjectSupport vector machines
dc.titleTransfer learning based code-mixed part-of-speech tagging using character level representations for Indian languages

Files

Collections