Overlapping word removal is all you need: revisiting data imbalance in hope speech detection

No Thumbnail Available

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Taylor and Francis Ltd.

Abstract

Hope speech detection is a new task for finding and highlighting positive comments or supporting content from user-generated social media comments. For this task, we have used a Shared Task multilingual dataset on Hope Speech Detection for Equality, Diversity, and Inclusion (HopeEDI) for three languages English, code-switched Tamil and Malayalam. In this paper, we present deep learning techniques using context-aware string embeddings for word representations and Recurrent Neural Network (RNN) and pooled document embeddings for text representation. We have evaluated and compared the three models for each language with different approaches. Our proposed methodology works fine and achieved higher performance than baselines. The highest weighted average F-scores of 0.93, 0.58, and 0.84 are obtained on the task organisers{'} final evaluation test set. The proposed models are outperforming the baselines by 3{\%}, 2{\%} and 11{\%} in absolute terms for English, Tamil and Malayalam respectively. © 2023 Informa UK Limited, trading as Taylor & Francis Group.

Description

Keywords

Classification (of information), Modeling languages, Recurrent neural networks, Speech recognition, Text processing, Data imbalance, Embeddings, Focal loss, Hope speech detection, Language model, Malayalams, Speech detection, Text classification, User-generated, Word removals

Citation

Journal of Experimental and Theoretical Artificial Intelligence, 2024, 36, 8, pp. 1837-1859

Collections

Endorsement

Review

Supplemented By

Referenced By