Multilingual Models for Sentiment and Abusive Language Detection for Dravidian Languages

Anand Kumar, M.

Multilingual Models for Sentiment and Abusive Language Detection for Dravidian Languages

Date

2023

Authors

Anand Kumar, M.

Publisher

Incoma Ltd

Abstract

This work delves into the realm of abusive comment detection and sentiment analysis within code-mixed content, focusing specifically on Dravidian languages. The languages covered include Tulu, and Tamil. For this investigation, TFIDF-based Long Short-Term Memory (LSTM) and Hierarchical Attention Networks (HAN) are employed as the analytical tools. Interestingly, the research highlights the prevalence of traditional TF-IDF techniques over Hierarchical Attention models in both sentiment analysis and the identification of abusive language across the diverse linguistic landscape encompassing Tulu and Tamil. Of note is the Tulu sentiment analysis system, which demonstrates remarkable prowess in handling Positive and Neutral sentiments. In contrast, the sentiment analysis system tailored for Tamil exhibits comparatively lower performance levels. This discrepancy underscores the critical need for well-balanced datasets and intensified research endeavors to enhance the accuracy of sentiment analysis, particularly in the context of the Tamil language. Shifting focus to abusive language detection, the TF-IDF-LSTM models consistently outperform the Hierarchical Attention models. Intriguingly, the mixed models exhibit particular strength in classifying categories like "Homophobia" and "Xenophobia." This intriguing outcome accentuates the value of incorporating both code-mixed and original script data, presenting novel avenues for advancing social media analysis research in diverse linguistic scenarios involving the Dravidian languages. Â© 2023 LTEDI 2023 - 3rd Workshop on Language Technology for Equality, Diversity and Inclusion, associated with the 14th International Conference on Recent Advances in Natural Language Processing, RANLP 2023 - Proceedings. All rights reserved.

Citation

LTEDI 2023 - 3rd Workshop on Language Technology for Equality, Diversity and Inclusion, associated with the 14th International Conference on Recent Advances in Natural Language Processing, RANLP 2023 - Proceedings, 2023, Vol., , p. 17-24

URI

https://doi.org/10.26615/978-954-452-084-7_003
https://idr.nitk.ac.in/handle/123456789/29329

Collections

Conference Papers

Full item page

Multilingual Models for Sentiment and Abusive Language Detection for Dravidian Languages

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By