Leveraging Hybrid Modeling for Enhanced Runtime Prediction in Big Data Jobs

dc.contributor.authorSingh, R.
dc.contributor.authorZadokar, V.N.
dc.contributor.authorKumar, S.
dc.contributor.authorDoddamani, S.S.
dc.contributor.authorBhowmik, B.
dc.date.accessioned2026-02-06T06:33:38Z
dc.date.issued2024
dc.description.abstractIn an era of rapid data expansion, big data has significantly transformed various industries, redefining the processes of data processing, analysis, and utilization. The widespread adoption of digital technologies has driven this surge in big data, leading to an unprecedented accumulation of information from sources such as social media, sensors, and transactions. As big data evolves, it presents significant challenges and unique opportunities, necessitating innovative solutions to leverage its potential fully. One critical challenge in big data environments is accurately predicting job runtimes, essential for optimizing resource utilization and enhancing overall system performance. Current approaches, including analytical models and machine learning algorithms, often need help to manage the complexities of unstructured data and maintain interpretability effectively. This paper proposes a novel hybrid modeling approach that integrates the strengths of both techniques to improve job runtime predictions. The hybrid architecture combines an analytical model, which captures the intricate characteristics of jobs and execution environments, with a machine learning model trained to detect patterns and relationships in historical data. As demonstrated on real-world big datasets, the hybrid model achieves greater accuracy by merging these capabilities. Utilizing the flexible capabilities of PySpark and incorporating advanced feature engineering techniques, the model dynamically adapts to various dataset sizes and complexities, ensuring robust performance across different scenarios. © 2024 IEEE.
dc.identifier.citationCOSMIC 2024 - IEEE International Conference on Computing, Semiconductor, Mechatronics, Intelligent Systems and Communications, Proceedings, 2024, Vol., , p. 48-53
dc.identifier.urihttps://doi.org/10.1109/COSMIC63293.2024.10871292
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/28772
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.subjectAnalytical Models
dc.subjectBig Data
dc.subjectHybrid Modeling
dc.subjectJob Runtime Prediction
dc.subjectMachine Learning
dc.subjectPySpark
dc.titleLeveraging Hybrid Modeling for Enhanced Runtime Prediction in Big Data Jobs

Files