Detecting Semantic Similarity of Documents Using Natural Language Processing

No Thumbnail Available

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Elsevier B.V.

Abstract

The similarity of documents in natural languages can be judged based on how similar the embeddings corresponding to their textual content are. Embeddings capture the lexical and semantic information of texts, and they can be obtained through bag-of-words approaches using the embeddings of constituent words or through pre-trained encoders. This paper examines various existing approaches to obtain embeddings from texts, which is then used to detect similarity between them. A novel model which builds upon the Universal Sentence Encoder is also developed to do the same. The explored models are tested on the SICK-dataset, and the correlation between the ground truth values given in the dataset and the predicted similarity is computed using the Pearson, Spearman and Kendall's Tau correlation metrics. Experimental results demonstrate that the novel model outperforms the existing approaches. Finally, an application is developed using the novel model to detect semantic similarity between a set of documents. © 2021 Elsevier B.V.. All rights reserved.

Description

Keywords

Computational Linguistic, Deep Learning, Embeddings, Natural Language Processing, Semantic Similarity

Citation

Procedia CIRP, 2021, Vol.189, , p. 128-135

Endorsement

Review

Supplemented By

Referenced By