fastText-Based Siamese Network for Hindi Semantic Textual Similarity

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Science and Business Media Deutschland GmbH

Abstract

Semantic textual similarity is a measurement of the degree of similarity or equivalence between two sentences semantically. Semantic sentence pairs have the ability to substitute text from each other and retain their meaning. Various rule-based and machine learning models have gained quick prominence in the field, especially in a language like English, where there is an abundance of lexical tools and resources. However, other languages like Hindi have not seen much improvement in state-of-the-art methods and models to evaluate semantic similarity of text data. This paper proposes a fastText-based Siamese neural network architecture to evaluate the semantic equivalency between a Hindi sentence pair. The pair is scored on a scale of 0–5, where 0 indicates least similar and 5 indicates most similar. The corpus contains a combination of two datasets containing manually scored sentence pairs. The performance parameters used to evaluate this approach are model accuracy and model loss over a training period of multiple epochs. The proposed architecture incorporates a fastText-based embedding layer and a bi-directional Long Short Term Memory layer to achieve a similarity score. The proposed architecture can extract semantic and various global features of the text to determine a similarity score. This model achieves an accuracy of 85.5% on a compiled Hindi-Hindi sentence pair dataset, which is a considerable improvement over existing rule and supervise-based systems. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

Description

Keywords

Embeddings, fastText, Hind, I LSTM, Semantics, Siamese network

Citation

Lecture Notes in Networks and Systems, 2025, Vol.1295, , p. 53-64

Endorsement

Review

Supplemented By

Referenced By