Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 2 of 2
  • Item
    LSTM-Attention Architecture for Online Bilingual Sexism Detection
    (CEUR-WS, 2023) Ravi, S.; Kelkar, S.; Anand Kumar, M.
    The paper describes the results submitted by ‘Team-SMS’ at EXIST 2023. A dataset of 6920 tweets for training, 1038 for validation, and 2076 tweets for testing was provided by the task organizers to train and test our models. Our models include LSTM models coupled with attention layers and without attention. For calculation of soft scores according to the task we tried to mimic human performance by taking an average of different machine learning model predictions using Multinomial Naive Bayes, Linear Support Vector Classifier, Multi Layer Perceptron, XGBoost, LSTM using GloVe embeddings, and LSTM using fastText embeddings. We discuss our approach to remove the ambiguity in the labeling process and detailed description of our work. © 2023 Copyright for this paper by its authors.
  • Item
    fastText-Based Siamese Network for Hindi Semantic Textual Similarity
    (Springer Science and Business Media Deutschland GmbH, 2025) Chandrashekar, A.; Rushad, M.; Nambiar, A.; Rashmi, V.; Koolagudi, S.G.
    Semantic textual similarity is a measurement of the degree of similarity or equivalence between two sentences semantically. Semantic sentence pairs have the ability to substitute text from each other and retain their meaning. Various rule-based and machine learning models have gained quick prominence in the field, especially in a language like English, where there is an abundance of lexical tools and resources. However, other languages like Hindi have not seen much improvement in state-of-the-art methods and models to evaluate semantic similarity of text data. This paper proposes a fastText-based Siamese neural network architecture to evaluate the semantic equivalency between a Hindi sentence pair. The pair is scored on a scale of 0–5, where 0 indicates least similar and 5 indicates most similar. The corpus contains a combination of two datasets containing manually scored sentence pairs. The performance parameters used to evaluate this approach are model accuracy and model loss over a training period of multiple epochs. The proposed architecture incorporates a fastText-based embedding layer and a bi-directional Long Short Term Memory layer to achieve a similarity score. The proposed architecture can extract semantic and various global features of the text to determine a similarity score. This model achieves an accuracy of 85.5% on a compiled Hindi-Hindi sentence pair dataset, which is a considerable improvement over existing rule and supervise-based systems. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.