JSON Document Clustering Based on Structural Similarity and Semantic Fusion

Uma Priya, D.; Santhi Thilagam, P.S.

JSON Document Clustering Based on Structural Similarity and Semantic Fusion

dc.contributor.author	Uma Priya, D.
dc.contributor.author	Santhi Thilagam, P.S.
dc.date.accessioned	2026-02-08T16:50:04Z
dc.date.issued	2023
dc.description.abstract	The emerging drift toward real-time applications generates massive amounts of JSON data exponentially over the web. Dealing with the heterogeneous structures of JSON document collections is challenging for efficient data management and knowledge discovery. Clustering JSON documents has become a significant issue in organizing large data collections. Existing research has focused on clustering JSON documents using structural or semantic similarity measures. However, differently annotated JSON structures are also related by the context of the JSON attributes. As a result, existing research work is unable to identify the context hidden in the schemas, emphasizing the importance of leveraging the syntactic, semantic, and contextual properties of heterogeneous JSON schemas. To address the specific research gap, this work proposes JSON Similarity (JSim), a novel approach for clustering JSON documents by combining the structural and semantic similarity scores of JSON schemas. In order to capture more semantics, the semantic fusion method is proposed, which correlates schemas using semantic as well as contextual similarity measures. The JSON documents are clustered based on the weighted similarity matrix. The results and findings show that the proposed approach outperforms the current approaches significantly. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
dc.identifier.citation	Lecture Notes on Data Engineering and Communications Technologies, 2023, Vol.163, , p. 51-62
dc.identifier.issn	23674512
dc.identifier.uri	https://doi.org/10.56578/jemse040403
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/33633
dc.publisher	Springer Science and Business Media Deutschland GmbH
dc.subject	Clustering
dc.subject	JSON
dc.subject	Semantic similarity
dc.subject	Structural similarity
dc.title	JSON Document Clustering Based on Structural Similarity and Semantic Fusion

Collections

Book Chapters

JSON Document Clustering Based on Structural Similarity and Semantic Fusion

Files

Collections