Detecting Semantic Similarity of Documents Using Natural Language Processing

dc.contributor.authorAgarwala, S.
dc.contributor.authorAnagawadi, A.
dc.contributor.authorReddy Guddeti, R.M.
dc.date.accessioned2026-02-06T06:36:16Z
dc.date.issued2021
dc.description.abstractThe similarity of documents in natural languages can be judged based on how similar the embeddings corresponding to their textual content are. Embeddings capture the lexical and semantic information of texts, and they can be obtained through bag-of-words approaches using the embeddings of constituent words or through pre-trained encoders. This paper examines various existing approaches to obtain embeddings from texts, which is then used to detect similarity between them. A novel model which builds upon the Universal Sentence Encoder is also developed to do the same. The explored models are tested on the SICK-dataset, and the correlation between the ground truth values given in the dataset and the predicted similarity is computed using the Pearson, Spearman and Kendall's Tau correlation metrics. Experimental results demonstrate that the novel model outperforms the existing approaches. Finally, an application is developed using the novel model to detect semantic similarity between a set of documents. © 2021 Elsevier B.V.. All rights reserved.
dc.identifier.citationProcedia CIRP, 2021, Vol.189, , p. 128-135
dc.identifier.issn22128271
dc.identifier.urihttps://doi.org/10.1016/j.procs.2021.05.076
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/30349
dc.publisherElsevier B.V.
dc.subjectComputational Linguistic
dc.subjectDeep Learning
dc.subjectEmbeddings
dc.subjectNatural Language Processing
dc.subjectSemantic Similarity
dc.titleDetecting Semantic Similarity of Documents Using Natural Language Processing

Files