Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 5 of 5

Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German
(Association for Computing Machinery, 2020) Mandl, T.; Modha, S.; Anand Kumar, M.; Chakravarthi, B.R.
This paper presents the HASOC track and its two parts. HASOC is dedicated to evaluate technology for finding Offensive Language and Hate Speech. HASOC is creating test collections for languages with few resources and English for comparison. The first track within HASOC has continued work from 2019 and provided a testbed of Twitter posts for Hindi, German and English. The second track within HASOC has created test resources for Tamil and Malayalam in native and Latin script. Posts were extracted mainly from Youtube and Twitter. Both tracks have attracted much interest and over 40 research groups have participated as well as described their approaches in papers. In this overview, we present the tasks, the data and the main results. Â© 2020 ACM.
Findings of Shared Task on Offensive Language Identification in Tamil and Malayalam
(Association for Computing Machinery, 2021) Kumaresan, P.K.; Premjith; Sakuntharaj, R.; Thavareesan, S.; Subalalitha, S.; Anand Kumar, M.; Chakravarthi, B.R.; Mccrae, J.P.
We present the results of HASOC-Dravidian-CodeMix shared task1 held at FIRE 2021, a track on offensive language identification for Dravidian languages in Code-Mixed Text in this paper. This paper will detail the task, its organisation, and the submitted systems. The identification of offensive language was viewed as a classification task. For this, 16 teams participated in identifying offensive language from Tamil-English code mixed data, 11 teams for Malayalam-English code mixed data and 14 teams for Tamil data. The teams detected offensive language using various machine learning and deep learning classification models. This paper has analysed those benchmark systems to find out how well they accommodate a code-mixed scenario in Dravidian languages, focusing on Tamil and Malayalam. Â© 2021 Owner/Author.
Hate speech review in the context of online social networks
(Elsevier Ltd, 2018) Chetty, N.; Alathur, S.
Advances in Internet Technologies (ITs) and online social networks have made more benefits to humanity. At the same time, the dark side of this growth/benefit has led to increased hate speech and terrorism as most common and powerful threats globally. Hate speech is an offensive kind of communication mechanism that expresses an ideology of hate using stereotypes. Hate speech targets different protected characteristics such as gender, religion, race, and disability. Control of hate speech can be made using different national and international legal frameworks. Any intentional act directed against life or related entities causing a common danger is known as terrorism. There is a common practice of discussing or debating hate speech and terrorism separately. In the recent past, most of the research articles have discussed either hate speech or terrorism. Hate speech is a type of terrorism and follows an incident or trigger event of terrorism. Online social networks are the result of ITs and evolved rapidly through the popularity among youth. As both the activities are near to close and makes use of online social networks, the collective discussion is appropriate. Therefore we have a review on hate speech with different classes and terrorism with cyber use in the framework of online social networks. With the help of combined effort from the government, the Internet Service Providers (ISPs) and online social networks, the proper policies can be framed to counter both hate speech and terrorism efficiently and effectively. © 2018 Elsevier Ltd
Automatic hate speech detection in audio using machine learning algorithms
(Springer, 2024) Imbwaga, J.L.; Chittaragi, N.B.; Koolagudi, S.G.
Even though every individual is entitled to freedom of speech, some limitations exist when this freedom is used to target and harm another individual or a group of people, as it translates to hate speech. In this study, the proposed research deals with detection of hate speech for English and Kiswahili languages from audio. The dataset used in this work was collected manually from YouTube videos and then converted to audio. Audio-based features namely spectral, temporal, prosodic and excitation source features were extracted and used to train various machine learning classifiers. Initial experiments were conducted for English language and later on for Kiswahili language. However, it is observed from literature that research activities on Kiswahili language is comparatively lesser. The scores calculated for accuracy, recall, precision, auc and f1 score in detecting hate speech, suggest that Random Forest classifier performed better for English language while the Extreme Gradient Boosting classifier performed better for Kiswahili language. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
Explainable hate speech detection using LIME
(Springer, 2024) Imbwaga, J.L.; Chittaragi, N.B.; Koolagudi, S.G.
Free speech is essential, but it can conflict with protecting marginalized groups from harm caused by hate speech. Social media platforms have become breeding grounds for this harmful content. While studies exist to detect hate speech, there are significant research gaps. First, most studies used text data instead of other modalities such as videos or audio. Second, most studies explored traditional machine learning algorithms. However, due to the increase in complexities of computational tasks, there is need to employ complex techniques and methodologies. Third, majority of the research studies have either been evaluated using very few evaluation metrics or not statistically evaluated at all. Lastly, due to the opaque, black-box nature of the complex classifiers, there is need to use explainability techniques. This research aims to address these gaps by detecting hate speech in English and Kiswahili languages using videos manually collected from YouTube. The videos were converted to text and used to train various classifiers. The performance of these classifiers was evaluated using various evaluation and statistical measurements. The experimental results suggest that the random forest classifier achieved the highest results for both languages across all evaluation measurements compared to all classifiers used. The results for English language were: accuracy 98%, AUC 96%, precision 99%, recall 97%, F1 98%, specificity 98% and MCC 96% while the results for Kiswahili language were: accuracy 90%, AUC 94%, precision 93%, recall 92%, F1 94%, specificity 87% and MCC 75%. These results suggest that the random forest classifier is robust, effective and efficient in detecting hate speech in any language. This also implies that the classifier is reliable in detecting hate speech and other related problems in social media. However, to understand the classifiers’ decision-making process, we used the Local Interpretable Model-agnostic Explanations (LIME) technique to explain the predictions achieved by the random forest classifier. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results