Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 4 of 4

Overlapping word removal is all you need: revisiting data imbalance in hope speech detection
(Taylor and Francis Ltd., 2024) RamakrishnaIyer LekshmiAmmal, H.; Ravikiran, M.; Nisha, G.; Balamuralidhar, N.; Madhusoodanan, A.; Anand Kumar, A.K.; Chakravarthi, B.R.
Hope speech detection is a new task for finding and highlighting positive comments or supporting content from user-generated social media comments. For this task, we have used a Shared Task multilingual dataset on Hope Speech Detection for Equality, Diversity, and Inclusion (HopeEDI) for three languages English, code-switched Tamil and Malayalam. In this paper, we present deep learning techniques using context-aware string embeddings for word representations and Recurrent Neural Network (RNN) and pooled document embeddings for text representation. We have evaluated and compared the three models for each language with different approaches. Our proposed methodology works fine and achieved higher performance than baselines. The highest weighted average F-scores of 0.93, 0.58, and 0.84 are obtained on the task organisers{'} final evaluation test set. The proposed models are outperforming the baselines by 3{\%}, 2{\%} and 11{\%} in absolute terms for English, Tamil and Malayalam respectively. © 2023 Informa UK Limited, trading as Taylor & Francis Group.
The Effect of Phrase Vector Embedding in Explainable Hierarchical Attention-Based Tamil Code-Mixed Hate Speech and Intent Detection
(Institute of Electrical and Electronics Engineers Inc., 2024) Sharmila Devi, V.S.; Subramanian, S.; Anand Kumar, A.K.
The substantial growth in social media users has led to a significant increase in code-mixed content on social media platforms. Millions of users on these platforms upload pictures and videos and post comments regarding their recent or exciting activities. Responding to this uploaded content, a few users occasionally use offensive language to insult others or specific groups. Social media platforms encounter challenges identifying and removing hate speech and objectionable content in various languages. Hate speech, in its general sense, refers to harmful posts directed at individuals or groups based on factors such as their sexuality, religion, community affiliation, disability, and others. Typically, offensive language is directly or indirectly utilized in hate speech posts to insult someone, causing psychological distress to users. In light of this, we propose developing a system to automatically block, remove, or report posts written in code-mixed Tamil containing hate speech. We have gathered code-mixed Tamil comments from Twitter and the Helo App, categorizing them as hate speech and classifying their intent. We have identified three categories of hate speech intent, namely Targeted Individual (TI), Targeted Group (TG), and Others (O). The Targeted Individual (TI) class encompasses posts aimed at a specific individual target. At the same time, the Targeted Group (TG) category primarily focuses on identifying people based on their religion, community, gender, and other characteristics. The Others (O) category encompasses untargeted offensive posts and other posts containing offensive language. In this context, we propose using a phrase-based, Explainable Hierarchical Attention model for hate speech detection. The results demonstrate that the proposed method is more effective in identifying and explaining hate speech and offensive language in social media posts. © 2013 IEEE.
Automatic hate speech detection in audio using machine learning algorithms
(Springer, 2024) Imbwaga, J.L.; Chittaragi, N.B.; Koolagudi, S.G.
Even though every individual is entitled to freedom of speech, some limitations exist when this freedom is used to target and harm another individual or a group of people, as it translates to hate speech. In this study, the proposed research deals with detection of hate speech for English and Kiswahili languages from audio. The dataset used in this work was collected manually from YouTube videos and then converted to audio. Audio-based features namely spectral, temporal, prosodic and excitation source features were extracted and used to train various machine learning classifiers. Initial experiments were conducted for English language and later on for Kiswahili language. However, it is observed from literature that research activities on Kiswahili language is comparatively lesser. The scores calculated for accuracy, recall, precision, auc and f1 score in detecting hate speech, suggest that Random Forest classifier performed better for English language while the Extreme Gradient Boosting classifier performed better for Kiswahili language. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
Explainable hate speech detection using LIME
(Springer, 2024) Imbwaga, J.L.; Chittaragi, N.B.; Koolagudi, S.G.
Free speech is essential, but it can conflict with protecting marginalized groups from harm caused by hate speech. Social media platforms have become breeding grounds for this harmful content. While studies exist to detect hate speech, there are significant research gaps. First, most studies used text data instead of other modalities such as videos or audio. Second, most studies explored traditional machine learning algorithms. However, due to the increase in complexities of computational tasks, there is need to employ complex techniques and methodologies. Third, majority of the research studies have either been evaluated using very few evaluation metrics or not statistically evaluated at all. Lastly, due to the opaque, black-box nature of the complex classifiers, there is need to use explainability techniques. This research aims to address these gaps by detecting hate speech in English and Kiswahili languages using videos manually collected from YouTube. The videos were converted to text and used to train various classifiers. The performance of these classifiers was evaluated using various evaluation and statistical measurements. The experimental results suggest that the random forest classifier achieved the highest results for both languages across all evaluation measurements compared to all classifiers used. The results for English language were: accuracy 98%, AUC 96%, precision 99%, recall 97%, F1 98%, specificity 98% and MCC 96% while the results for Kiswahili language were: accuracy 90%, AUC 94%, precision 93%, recall 92%, F1 94%, specificity 87% and MCC 75%. These results suggest that the random forest classifier is robust, effective and efficient in detecting hate speech in any language. This also implies that the classifier is reliable in detecting hate speech and other related problems in social media. However, to understand the classifiers’ decision-making process, we used the Local Interpretable Model-agnostic Explanations (LIME) technique to explain the predictions achieved by the random forest classifier. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results