Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
Search Results
Item Leveraging Hybrid Modeling for Enhanced Runtime Prediction in Big Data Jobs(Institute of Electrical and Electronics Engineers Inc., 2024) Singh, R.; Zadokar, V.N.; Kumar, S.; Doddamani, S.S.; Bhowmik, B.In an era of rapid data expansion, big data has significantly transformed various industries, redefining the processes of data processing, analysis, and utilization. The widespread adoption of digital technologies has driven this surge in big data, leading to an unprecedented accumulation of information from sources such as social media, sensors, and transactions. As big data evolves, it presents significant challenges and unique opportunities, necessitating innovative solutions to leverage its potential fully. One critical challenge in big data environments is accurately predicting job runtimes, essential for optimizing resource utilization and enhancing overall system performance. Current approaches, including analytical models and machine learning algorithms, often need help to manage the complexities of unstructured data and maintain interpretability effectively. This paper proposes a novel hybrid modeling approach that integrates the strengths of both techniques to improve job runtime predictions. The hybrid architecture combines an analytical model, which captures the intricate characteristics of jobs and execution environments, with a machine learning model trained to detect patterns and relationships in historical data. As demonstrated on real-world big datasets, the hybrid model achieves greater accuracy by merging these capabilities. Utilizing the flexible capabilities of PySpark and incorporating advanced feature engineering techniques, the model dynamically adapts to various dataset sizes and complexities, ensuring robust performance across different scenarios. © 2024 IEEE.
