Faculty Publications

Now showing 1 - 4 of 4

Overview of the track on HASOC-offensive Language Identification-DravidianCodeMix
(CEUR-WS, 2020) Chakravarthi, B.R.; Anand Kumar, M.; Mccrae, J.P.; Premjith, B.; Padannayil, K.P.; Mandl, T.
We present the results and main findings of the HASOC-Offensive Language Identification on code mixed Dravidian languages. The task featured two tasks. Task 1 is about offensive language identification in Malayalam language where the comment were written in both native script and Latin script. Task 2 is about offensive language identification in Tamil and Malayalam languages where the comments were written in Latin script (non-native script). For both the task, given a comment the participants should develop a system to classify the text into offensive or not-offensive. In total 96 participants participated and 12 participants submitted the papers. In this paper, we present the task, data, the results and discuss the system submission and methods used by participants. Â© 2020 Copyright for this paper by its authors.
Overview of the HASOC-DravidianCodeMix Shared Task on Offensive Language Detection in Tamil and Malayalam
(CEUR-WS, 2021) Chakravarthi, B.R.; Kumaresan, P.K.; Sakuntharaj, R.; Anand Kumar, M.; Thavareesan, S.; Premjith, B.; Sreelakshmi, K.; Subalalitha, S.C.; Mccrae, J.P.; Mandl, T.
We present the results of HASOC-Dravidian-CodeMix shared task1 held at FIRE 2021, a track on offensive language identification for Dravidian languages in Code-Mixed Text in this paper. This paper will detail the task, its organisation, and the submitted systems. The identification of offensive language was viewed as a classification task. For this, 16 teams participated in identifying offensive language from Tamil-English code mixed data, 11 teams for Malayalam-English code mixed data and 14 teams for Tamil data. The teams detected offensive language using various machine learning and deep learning classification models. This paper has analysed those benchmark systems to find out how well they accommodate a code-mixed scenario in Dravidian languages, focusing on Tamil and Malayalam. Â© 2021 Copyright for this paper by its authors.
Context Sensitive Tamil Language Spellchecker Using RoBERTa
(Springer Science and Business Media Deutschland GmbH, 2023) Rajalakshmi, R.; Sharma, V.; Anand Kumar, M.
A spellchecker is a tool that helps to identify spelling errors in a piece of text and lists out the possible suggestions for that word. There are many spell-checkers available for languages such as English but a limited number of spell-checking tools are found for low-resource languages like Tamil. In this paper, we present an approach to develop a Tamil spell checker using the RoBERTa (xlm-roberta-base) model. We have also proposed an algorithm to generate the test dataset by introducing errors in a piece of text. The spellchecker finds out the mistake in a given text using a corpus of unique Tamil words collected from different sources such as Wikipedia and Tamil conversations, and lists out the suggestions that could be the potential contextual replacement of the misspelled word using the proposed model. On introducing a few errors in a piece of text collected from a Wikipedia article and testing it on our model, an accuracy of 91.14% was achieved for error detection. Contextually correct words were then suggested for these erroneous words detected. Our spellchecker performed better than some of the existing Tamil spellcheckers in terms of both higher accuracy and lower false positives. Â© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
A reasoning based explainable multimodal fake news detection for low resource language using large language models and transformers
(Springer Nature, 2025) LekshmiAmmal, H.R.; Anand Kumar, M.
Nowadays, individuals rely predominantly on online social media platforms, news feeds, websites, and news aggregator applications to acquire recent news stories. This trend has resulted in an increase in the number of available social media platforms, online news feeds, and news aggregator applications. These news platforms have been accused of spreading fake news to gain more attention and recognition. Earlier, this misinformation or fake news used to be propagated only in the text form. However, with the advent of technology, now it is spread in multimodal forms, such as images with text, videos, and audio with textual content. Currently, the automatic fake news detection models are focused on high resource languages and superficial output. Social media users need clarity and reasoning when it comes to identifying fake news, rather than just a superficial classification of news as fake. Providing context, reasoning, and explanations can help users understand why certain news is misleading or false. Hence, a multimodal system has to be developed to identify and justify fake news. In this proposed work, we have developed a multimodal fake news system for the Low Resource Language Tamil with reasoning-based explainability. The dataset for this proposed work is retrieved from fact-check websites and official news websites. We have experimented with different combinations of models for visual and text modalities. Further, we integrated LLM-based image descriptions into our model with the text and visual features, resulting in an F1 score of 0.8736. We used the Siamese model to determine the similarity of the news and its image descriptions. Additionally, we conducted error analysis and used explainable artificial intelligence to explore the reasoning behind our model’s predictions. We also present the textual reasoning for the model’s predictions and match them with images. © The Author(s) 2025.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results