fastText-Based Siamese Network for Hindi Semantic Textual Similarity

dc.contributor.authorChandrashekar, A.
dc.contributor.authorRushad, M.
dc.contributor.authorNambiar, A.
dc.contributor.authorRashmi, V.
dc.contributor.authorKoolagudi, S.G.
dc.date.accessioned2026-02-06T06:33:29Z
dc.date.issued2025
dc.description.abstractSemantic textual similarity is a measurement of the degree of similarity or equivalence between two sentences semantically. Semantic sentence pairs have the ability to substitute text from each other and retain their meaning. Various rule-based and machine learning models have gained quick prominence in the field, especially in a language like English, where there is an abundance of lexical tools and resources. However, other languages like Hindi have not seen much improvement in state-of-the-art methods and models to evaluate semantic similarity of text data. This paper proposes a fastText-based Siamese neural network architecture to evaluate the semantic equivalency between a Hindi sentence pair. The pair is scored on a scale of 0–5, where 0 indicates least similar and 5 indicates most similar. The corpus contains a combination of two datasets containing manually scored sentence pairs. The performance parameters used to evaluate this approach are model accuracy and model loss over a training period of multiple epochs. The proposed architecture incorporates a fastText-based embedding layer and a bi-directional Long Short Term Memory layer to achieve a similarity score. The proposed architecture can extract semantic and various global features of the text to determine a similarity score. This model achieves an accuracy of 85.5% on a compiled Hindi-Hindi sentence pair dataset, which is a considerable improvement over existing rule and supervise-based systems. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
dc.identifier.citationLecture Notes in Networks and Systems, 2025, Vol.1295, , p. 53-64
dc.identifier.issn23673370
dc.identifier.urihttps://doi.org/10.1007/978-981-96-3311-1_5
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/28692
dc.publisherSpringer Science and Business Media Deutschland GmbH
dc.subjectEmbeddings
dc.subjectfastText
dc.subjectHind
dc.subjectI LSTM
dc.subjectSemantics
dc.subjectSiamese network
dc.titlefastText-Based Siamese Network for Hindi Semantic Textual Similarity

Files