Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
Search Results
Item Effective Resource Utilization in Hadoop Using Ganglia(Institute of Electrical and Electronics Engineers Inc., 2024) Srungarapati, B.; Pamarthi, M.; Vakada, V.; Hegde, A.; Bhowmik, B.The exponential growth of big data has led to the widespread adoption of Hadoop clusters for storing and processing large volumes of data. Efficient management of resources within these clusters is crucial for achieving optimal performance and cost efficiency. This research paper explores the use of Hadoop and Ganglia for monitoring and optimizing resource utilization in Hadoop clusters. The study demonstrates that leveraging Hadoop and Ganglia is an effective strategy for improving cluster performance and resource efficiency. Results show significant enhancements in cluster performance and resource utilization, highlighting the importance of proactive resource management in Hadoop environments. © 2024 IEEE.Item Leveraging Hybrid Modeling for Enhanced Runtime Prediction in Big Data Jobs(Institute of Electrical and Electronics Engineers Inc., 2024) Singh, R.; Zadokar, V.N.; Kumar, S.; Doddamani, S.S.; Bhowmik, B.In an era of rapid data expansion, big data has significantly transformed various industries, redefining the processes of data processing, analysis, and utilization. The widespread adoption of digital technologies has driven this surge in big data, leading to an unprecedented accumulation of information from sources such as social media, sensors, and transactions. As big data evolves, it presents significant challenges and unique opportunities, necessitating innovative solutions to leverage its potential fully. One critical challenge in big data environments is accurately predicting job runtimes, essential for optimizing resource utilization and enhancing overall system performance. Current approaches, including analytical models and machine learning algorithms, often need help to manage the complexities of unstructured data and maintain interpretability effectively. This paper proposes a novel hybrid modeling approach that integrates the strengths of both techniques to improve job runtime predictions. The hybrid architecture combines an analytical model, which captures the intricate characteristics of jobs and execution environments, with a machine learning model trained to detect patterns and relationships in historical data. As demonstrated on real-world big datasets, the hybrid model achieves greater accuracy by merging these capabilities. Utilizing the flexible capabilities of PySpark and incorporating advanced feature engineering techniques, the model dynamically adapts to various dataset sizes and complexities, ensuring robust performance across different scenarios. © 2024 IEEE.Item Optimizing Feature Selection in Big Data: A Hybrid Spark and Fuzzy Approach(Institute of Electrical and Electronics Engineers Inc., 2024) Hada, A.S.; Sahoo, G.S.; Vamsi, C.K.; Hegde, A.; Bhowmik, B.The exponential growth of big data presents both immense opportunities and significant challenges. While vast datasets hold the key to unlocking groundbreaking insights, efficiently extracting value requires sophisticated feature selection techniques. Traditional methods often struggle with the sheer volume and complexity of big data. This paper addresses this challenge by proposing a novel hybrid feature selection algorithm by leveraging Apache PySpark's distributed computing power. Combining a robust feature selection technique with a novel weighting scheme, our method outperforms existing hypercuboid and fuzzy Rough Set methods. The hybrid approach achieves superior accuracy of 72.1% with a reduced feature set, demonstrating its effectiveness in identifying salient features for big data analysis. © 2024 IEEE.Item Outlier Detection in Streaming Data Using Deep Learning Models(Institute of Electrical and Electronics Engineers Inc., 2024) Dudipala, S.; Gangavarapu, S.; Girish, K.K.; Bhowmik, B.In the realm of the Internet of Things (IoT), devices continuously generate a vast and relentless stream of data, providing a real-time representation of digital landscape. The continuous and high-velocity nature of this streaming data poses significant challenges for real-time analysis. Accurate outlier detection within this data is essential, as such anomalies may indicate critical issues, attacks, or errors. Nevertheless, the dynamic and rapidly evolving characteristics of streaming data render traditional outlier detection methods inadequate. This paper investigates the application of Artificial Neural Networks (ANNs), specifically a Multi-Layer Perceptron (MLP), for outlier detection in streaming IoT data. The selection of the MLP from a range of Deep Neural Networks (DNNs) is based on its optimal balance between computational efficiency and model complexity. The model's efficacy is confirmed through rigorous experimentation, demonstrating strong performance across diverse scenarios and data classes. The MLP achieved an accuracy of 99.4%, underscoring its ability to detect even minor deviations from expected patterns. This high level of accuracy establishes the MLP as a robust tool for outlier detection in dynamic IoT environments. © 2024 IEEE.
