Semantic Similarity and Paraphrase Identification for Malayalam Using Deep Autoencoders
No Thumbnail Available
Date
2021
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Science and Business Media Deutschland GmbH
Abstract
In this chapter, we deal with the sentence-level paraphrase identification for the Malayalam language. We use recursive autoencoder architecture for the unsupervised learning of phrase representations to extract features for paraphrase identification. Sentence’s features of varying lengths are converted to fixed-size representation using the convolution method of dynamic pooling. Initially, the Malayalam paraphrase identification system was designed to identify paraphrases and non-paraphrases alone and later extended to identify semi-equivalent paraphrases. Along with semantic features, conventional statistical features are further taken into account, resulting in improved system performance. The proposed system was implemented using word2vec embedding and obtained 77.67% accuracy for the two-class system and 66.07% for the three-class system. This chapter also discusses different experiments done for choosing the best parameters and embedding models. © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Description
Keywords
Deep learning, Glove, Malayalam paraphrase identification, Recursive autoencoders, Word2vec
Citation
Signals and Communication Technology, 2021, Vol., , p. 81-96
