Overlapping word removal is all you need: revisiting data imbalance in hope speech detection

RamakrishnaIyer LekshmiAmmal, H.Ravikiran, M.Nisha, G.Balamuralidhar, N.Madhusoodanan, A.Anand Kumar, A.K.Chakravarthi, B.R.2026-02-042024Journal of Experimental and Theoretical Artificial Intelligence, 2024, 36, 8, pp. 1837-18590952813Xhttps://doi.org/10.1080/0952813X.2023.2166130https://idr.nitk.ac.in/handle/123456789/21522Hope speech detection is a new task for finding and highlighting positive comments or supporting content from user-generated social media comments. For this task, we have used a Shared Task multilingual dataset on Hope Speech Detection for Equality, Diversity, and Inclusion (HopeEDI) for three languages English, code-switched Tamil and Malayalam. In this paper, we present deep learning techniques using context-aware string embeddings for word representations and Recurrent Neural Network (RNN) and pooled document embeddings for text representation. We have evaluated and compared the three models for each language with different approaches. Our proposed methodology works fine and achieved higher performance than baselines. The highest weighted average F-scores of 0.93, 0.58, and 0.84 are obtained on the task organisers{'} final evaluation test set. The proposed models are outperforming the baselines by 3{\%}, 2{\%} and 11{\%} in absolute terms for English, Tamil and Malayalam respectively. © 2023 Informa UK Limited, trading as Taylor & Francis Group.Classification (of information)Modeling languagesRecurrent neural networksSpeech recognitionText processingData imbalanceEmbeddingsFocal lossHope speech detectionLanguage modelMalayalamsSpeech detectionText classificationUser-generatedWord removalsOverlapping word removal is all you need: revisiting data imbalance in hope speech detection