Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
3 results
Search Results
Item Load Balancing of MongoDB with Tag Aware Sharding(De Gruyter, 2016) Shegokar, P.; Thomas, M.V.; Chandrasekaran, K.In the era of Big Data, traditional databases face many problems to process data effectively. To solve problems we have NoSQL databases which can solve the problems and are becoming very popular. MongoDB is a document oriented NoSQL database which can process large data sets with the help of sharding mechanism. MongoDB supports range-based sharding, hash-based sharding and tag aware sharding. Problem with range-based method is that one of the shard will have more load. In case of hash-based method, we can distribute the load among shards but related data aren’t together. Tag aware sharding method is better than the other two methods. But this method need to be enhanced as tagged data can belong to more shards and balancer have to migrate the data to most appropriate shard, hence we are enhancing the tag aware sharding in MongoDB. Improper distribution of data can’t allow us to use all the benefits of sharding. Tag aware sharding is administrator based method in which tags are mentioned by the administrator. According to tags, data is migrated but we need to balance the load. To solve this problem, Weighted Round Robin (WRR) load balancing algorithm is used and it has improved the writing and querying performance. © 2016 Walter de Gruyter GmbH, Berlin/Boston.Item Load balancing of MongoDB with tag aware sharding(Walter de Gruyter GmbH info@degruyter.com, 2015) Shegokar, P.; Thomas, M.V.; Chandrasekaran, K.In the era of Big Data, traditional databases face many problems to process data effectively. To solve problems we have NoSQL databases which can solve the problems and are becoming very popular. MongoDB is a document oriented NoSQL database which can process large data sets with the help of sharding mechanism. MongoDB supports range-based sharding, hash-based sharding and tag aware sharding. Problem with range-based method is that one of the shard will have more load. In case of hash-based method, we can distribute the load among shards but related data aren't together. Tag aware sharding method is better than the other two methods. But this method need to be enhanced as tagged data can belong to more shards and balancer have to migrate the data to most appropriate shard, hence we are enhancing the tag aware sharding in MongoDB. Improper distribution of data can't allow us to use all the benefits of sharding. Tag aware sharding is administrator based method in which tags are mentioned by the administrator. According to tags, data is migrated but we need to balance the load. To solve this problem, Weighted Round Robin (WRR) load balancing algorithm is used and it has improved the writing and querying performance.Item ClustVariants: An Approach for Schema Variants Extraction from JSON Document Collections(Institute of Electrical and Electronics Engineers Inc., 2022) Uma Priya, D.; Santhi Thilagam, P.S.The use of NoSQL Document Stores has grown in recent years as it offers the potential for increased scalability, flexibility, and consistency to store a massive collection of varied structured data in JSON format. Although the document stores do not impose any structural constraint on the data, the lack of schema information challenges efficient data processing, data management, and data integration. Hence, extant research focussed on identifying the global schema for a collection. Nevertheless, it comes at the cost of losing essential benefits of schema such as a detailed structural description of data, query optimization, etc. To address the specific research gap, we propose ClustVariants, a novel approach for discovering the exact schema variants available in a collection. While the complex structure of large heterogeneous JSON data can not be analyzed directly, we resolve this limitation by systematically extract the structure of data, analyze the fields, and cluster the homogeneous documents. We apply a distributed Formal Concept Analysis algorithm, using Apache Spark, to identify the schema variants from a large cluster of JSON documents. The experimental study on real datasets prove that ClustVariants is efficient in inferring exact schema variants of JSON document collections. © 2022 IEEE.
