Overlapping word removal is all you need: revisiting data imbalance in hope speech detection
No Thumbnail Available
Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
Taylor and Francis Ltd.
Abstract
Hope speech detection is a new task for finding and highlighting positive comments or supporting content from user-generated social media comments. For this task, we have used a Shared Task multilingual dataset on Hope Speech Detection for Equality, Diversity, and Inclusion (HopeEDI) for three languages English, code-switched Tamil and Malayalam. In this paper, we present deep learning techniques using context-aware string embeddings for word representations and Recurrent Neural Network (RNN) and pooled document embeddings for text representation. We have evaluated and compared the three models for each language with different approaches. Our proposed methodology works fine and achieved higher performance than baselines. The highest weighted average F-scores of 0.93, 0.58, and 0.84 are obtained on the task organisers{'} final evaluation test set. The proposed models are outperforming the baselines by 3{\%}, 2{\%} and 11{\%} in absolute terms for English, Tamil and Malayalam respectively. © 2023 Informa UK Limited, trading as Taylor & Francis Group.
Description
Keywords
Classification (of information), Modeling languages, Recurrent neural networks, Speech recognition, Text processing, Data imbalance, Embeddings, Focal loss, Hope speech detection, Language model, Malayalams, Speech detection, Text classification, User-generated, Word removals
Citation
Journal of Experimental and Theoretical Artificial Intelligence, 2024, 36, 8, pp. 1837-1859
