Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
2 results
Search Results
Item Conversational Hate-Offensive detection in Code-Mixed Hindi-English Tweets(CEUR-WS, 2021) Rajalakshmi, R.; Srivarshan, S.; Mattins, F.; Kaarthik, E.; Seshadri, P.; Anand Kumar, M.Hate speech in social media has increased due to the increased use of online forums for sharing the opinion among the people. Especially, people prefer expressing the views in their native language while posting such objectionable contents in many social media platforms. It is a challenging task to have an automated system to identify such hate and offensive tweets in many regional languages due to the rich linguistics nature. Recently, this problem has become too complicated, due to the use of multi-lingual and code-mixed tweets. The code-mixed data includes the mixing of two languages on the granular level. A word that might not be a part of either language may be found in the data. To address the above challenges in Hindi-English tweets, we propose an efficient method by combining the IndicBERT with an effective ensemble based method. We have applied different methodologies to find a way to accurately classify whether the given tweet is considered to be Hate Speech or Not in code-mixed Hinglish dataset. Three different models namely, IndicBERT, XLM Roberta and Masked LM were used to embed the tweet data. Then various classification methods such as Logistic Regression, Support Vector Machine, Ensembling and Neural Networks based method were applied to perform classification. From extensive experiments on the data set, embedding the code-mixed data with IndicBERT and Ensembling was found to be the best method, which resulted in an macro F1-score of 62.53%. This work was submitted to the shared task of the HASOC 2021 [1] [2] Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages Competition by team TNLP. © 2021 Copyright for this paper by its authors.Item Tamil NLP Technologies: Challenges, State of the Art, Trends and Future Scope(Springer Science and Business Media Deutschland GmbH, 2023) Rajendran, S.; Anand Kumar, M.; Rajalakshmi, R.; Dhanalakshmi, V.; Balasubramanian, P.; Padannayil, K.P.This paper aims to summarize the NLP-based technological development of the Tamil language. Tamil is one of the Dravidian languages that are serious about technological development. This phenomenon is reflected in its activities in developing language technology tools and the resources made for technological development. Tamil has successfully developed tools or systems for speech synthesis and recognition, grammatical analysis of grammar, semantics and social media text, along with machine translation. There are many types of research undertaken to orient towards this achievement. Similarly, many activities are developing resources to facilitate technological development. The activities include preparing text corpora for text including monolingual, parallel and lexical along with speech with lexical resources and grammar. What is needed now is to stock-take the achievement made so far and found out where Tamil is in the arena of technological development and looks forward further to its fast technological development. Computational linguistics in Tamil NLP is gaining more attraction, and various data sets available for research is highlighted in this work for further exploration. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
