Detecting Semantic Similarity of Documents Using Natural Language Processing

Agarwala, S.; Anagawadi, A.; Reddy Guddeti, R.M.

Detecting Semantic Similarity of Documents Using Natural Language Processing

Date

2021

Authors

Agarwala, S.

Anagawadi, A.

Reddy Guddeti, R.M.

Publisher

Elsevier B.V.

Abstract

The similarity of documents in natural languages can be judged based on how similar the embeddings corresponding to their textual content are. Embeddings capture the lexical and semantic information of texts, and they can be obtained through bag-of-words approaches using the embeddings of constituent words or through pre-trained encoders. This paper examines various existing approaches to obtain embeddings from texts, which is then used to detect similarity between them. A novel model which builds upon the Universal Sentence Encoder is also developed to do the same. The explored models are tested on the SICK-dataset, and the correlation between the ground truth values given in the dataset and the predicted similarity is computed using the Pearson, Spearman and Kendall's Tau correlation metrics. Experimental results demonstrate that the novel model outperforms the existing approaches. Finally, an application is developed using the novel model to detect semantic similarity between a set of documents. Â© 2021 Elsevier B.V.. All rights reserved.

Keywords

Computational Linguistic, Deep Learning, Embeddings, Natural Language Processing, Semantic Similarity

Citation

Procedia CIRP, 2021, Vol.189, , p. 128-135

URI

https://doi.org/10.1016/j.procs.2021.05.076
https://idr.nitk.ac.in/handle/123456789/30349

Collections

Conference Papers

Full item page

Detecting Semantic Similarity of Documents Using Natural Language Processing

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By