Detecting Semantic Similarity of Documents Using Natural Language Processing

Agarwala, S.; Anagawadi, A.; Reddy Guddeti, R.M.

Detecting Semantic Similarity of Documents Using Natural Language Processing

dc.contributor.author	Agarwala, S.
dc.contributor.author	Anagawadi, A.
dc.contributor.author	Reddy Guddeti, R.M.
dc.date.accessioned	2026-02-06T06:36:16Z
dc.date.issued	2021
dc.description.abstract	The similarity of documents in natural languages can be judged based on how similar the embeddings corresponding to their textual content are. Embeddings capture the lexical and semantic information of texts, and they can be obtained through bag-of-words approaches using the embeddings of constituent words or through pre-trained encoders. This paper examines various existing approaches to obtain embeddings from texts, which is then used to detect similarity between them. A novel model which builds upon the Universal Sentence Encoder is also developed to do the same. The explored models are tested on the SICK-dataset, and the correlation between the ground truth values given in the dataset and the predicted similarity is computed using the Pearson, Spearman and Kendall's Tau correlation metrics. Experimental results demonstrate that the novel model outperforms the existing approaches. Finally, an application is developed using the novel model to detect semantic similarity between a set of documents. Â© 2021 Elsevier B.V.. All rights reserved.
dc.identifier.citation	Procedia CIRP, 2021, Vol.189, , p. 128-135
dc.identifier.issn	22128271
dc.identifier.uri	https://doi.org/10.1016/j.procs.2021.05.076
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/30349
dc.publisher	Elsevier B.V.
dc.subject	Computational Linguistic
dc.subject	Deep Learning
dc.subject	Embeddings
dc.subject	Natural Language Processing
dc.subject	Semantic Similarity
dc.title	Detecting Semantic Similarity of Documents Using Natural Language Processing

Collections

Conference Papers

Detecting Semantic Similarity of Documents Using Natural Language Processing

Files

Collections