fastText-Based Siamese Network for Hindi Semantic Textual Similarity
| dc.contributor.author | Chandrashekar, A. | |
| dc.contributor.author | Rushad, M. | |
| dc.contributor.author | Nambiar, A. | |
| dc.contributor.author | Rashmi, V. | |
| dc.contributor.author | Koolagudi, S.G. | |
| dc.date.accessioned | 2026-02-06T06:33:29Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | Semantic textual similarity is a measurement of the degree of similarity or equivalence between two sentences semantically. Semantic sentence pairs have the ability to substitute text from each other and retain their meaning. Various rule-based and machine learning models have gained quick prominence in the field, especially in a language like English, where there is an abundance of lexical tools and resources. However, other languages like Hindi have not seen much improvement in state-of-the-art methods and models to evaluate semantic similarity of text data. This paper proposes a fastText-based Siamese neural network architecture to evaluate the semantic equivalency between a Hindi sentence pair. The pair is scored on a scale of 0–5, where 0 indicates least similar and 5 indicates most similar. The corpus contains a combination of two datasets containing manually scored sentence pairs. The performance parameters used to evaluate this approach are model accuracy and model loss over a training period of multiple epochs. The proposed architecture incorporates a fastText-based embedding layer and a bi-directional Long Short Term Memory layer to achieve a similarity score. The proposed architecture can extract semantic and various global features of the text to determine a similarity score. This model achieves an accuracy of 85.5% on a compiled Hindi-Hindi sentence pair dataset, which is a considerable improvement over existing rule and supervise-based systems. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. | |
| dc.identifier.citation | Lecture Notes in Networks and Systems, 2025, Vol.1295, , p. 53-64 | |
| dc.identifier.issn | 23673370 | |
| dc.identifier.uri | https://doi.org/10.1007/978-981-96-3311-1_5 | |
| dc.identifier.uri | https://idr.nitk.ac.in/handle/123456789/28692 | |
| dc.publisher | Springer Science and Business Media Deutschland GmbH | |
| dc.subject | Embeddings | |
| dc.subject | fastText | |
| dc.subject | Hind | |
| dc.subject | I LSTM | |
| dc.subject | Semantics | |
| dc.subject | Siamese network | |
| dc.title | fastText-Based Siamese Network for Hindi Semantic Textual Similarity |
