Leveraging Hybrid Modeling for Enhanced Runtime Prediction in Big Data Jobs
No Thumbnail Available
Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc.
Abstract
In an era of rapid data expansion, big data has significantly transformed various industries, redefining the processes of data processing, analysis, and utilization. The widespread adoption of digital technologies has driven this surge in big data, leading to an unprecedented accumulation of information from sources such as social media, sensors, and transactions. As big data evolves, it presents significant challenges and unique opportunities, necessitating innovative solutions to leverage its potential fully. One critical challenge in big data environments is accurately predicting job runtimes, essential for optimizing resource utilization and enhancing overall system performance. Current approaches, including analytical models and machine learning algorithms, often need help to manage the complexities of unstructured data and maintain interpretability effectively. This paper proposes a novel hybrid modeling approach that integrates the strengths of both techniques to improve job runtime predictions. The hybrid architecture combines an analytical model, which captures the intricate characteristics of jobs and execution environments, with a machine learning model trained to detect patterns and relationships in historical data. As demonstrated on real-world big datasets, the hybrid model achieves greater accuracy by merging these capabilities. Utilizing the flexible capabilities of PySpark and incorporating advanced feature engineering techniques, the model dynamically adapts to various dataset sizes and complexities, ensuring robust performance across different scenarios. © 2024 IEEE.
Description
Keywords
Analytical Models, Big Data, Hybrid Modeling, Job Runtime Prediction, Machine Learning, PySpark
Citation
COSMIC 2024 - IEEE International Conference on Computing, Semiconductor, Mechatronics, Intelligent Systems and Communications, Proceedings, 2024, Vol., , p. 48-53
