Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 3 of 3
  • Item
    JSON Document Clustering Based on Structural Similarity and Semantic Fusion
    (Springer Science and Business Media Deutschland GmbH, 2023) Uma Priya, D.; Santhi Thilagam, P.S.
    The emerging drift toward real-time applications generates massive amounts of JSON data exponentially over the web. Dealing with the heterogeneous structures of JSON document collections is challenging for efficient data management and knowledge discovery. Clustering JSON documents has become a significant issue in organizing large data collections. Existing research has focused on clustering JSON documents using structural or semantic similarity measures. However, differently annotated JSON structures are also related by the context of the JSON attributes. As a result, existing research work is unable to identify the context hidden in the schemas, emphasizing the importance of leveraging the syntactic, semantic, and contextual properties of heterogeneous JSON schemas. To address the specific research gap, this work proposes JSON Similarity (JSim), a novel approach for clustering JSON documents by combining the structural and semantic similarity scores of JSON schemas. In order to capture more semantics, the semantic fusion method is proposed, which correlates schemas using semantic as well as contextual similarity measures. The JSON documents are clustered based on the weighted similarity matrix. The results and findings show that the proposed approach outperforms the current approaches significantly. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
  • Item
    ClustVariants: An Approach for Schema Variants Extraction from JSON Document Collections
    (Institute of Electrical and Electronics Engineers Inc., 2022) Uma Priya, D.; Santhi Thilagam, P.S.
    The use of NoSQL Document Stores has grown in recent years as it offers the potential for increased scalability, flexibility, and consistency to store a massive collection of varied structured data in JSON format. Although the document stores do not impose any structural constraint on the data, the lack of schema information challenges efficient data processing, data management, and data integration. Hence, extant research focussed on identifying the global schema for a collection. Nevertheless, it comes at the cost of losing essential benefits of schema such as a detailed structural description of data, query optimization, etc. To address the specific research gap, we propose ClustVariants, a novel approach for discovering the exact schema variants available in a collection. While the complex structure of large heterogeneous JSON data can not be analyzed directly, we resolve this limitation by systematically extract the structure of data, analyze the fields, and cluster the homogeneous documents. We apply a distributed Formal Concept Analysis algorithm, using Apache Spark, to identify the schema variants from a large cluster of JSON documents. The experimental study on real datasets prove that ClustVariants is efficient in inferring exact schema variants of JSON document collections. © 2022 IEEE.
  • Item
    Leveraging Structural and Semantic Measures for JSON Document Clustering
    (IICM, 2023) Uma Priya, D.; Santhi Thilagam, P.S.
    In recent years, the increased use of smart devices and digital business opportunities has generated massive heterogeneous JSON data daily, making efficient data storage and management more difficult. Existing research uses different similarity metrics and clusters the documents to support the above tasks effectively. However, extant approaches have focused on either structural or semantic similarity of schemas. As JSON documents are application-specific, differently annotated JSON schemas are not only structurally heterogeneous but also differ by the context of the JSON attributes. Therefore, there is a need to consider the structural, semantic, and contextual properties of JSON schemas to perform meaningful clustering of JSON documents. This work proposes an approach to cluster heterogeneous JSON documents using the similarity fusion method. The similarity fusion matrix is constructed using structural, semantic, and contextual measures of JSON schemas. The experimental results demonstrate that the proposed approach outperforms the existing approaches significantly. © 2023, IICM. All rights reserved.