A Comprehensive Review on Scaling Machine Learning Workflows Using Cloud Technologies and DevOps

dc.contributor.authorRamesh, G.
dc.contributor.authorVaikunta Pai, T.
dc.contributor.authorBirǎu, R.
dc.contributor.authorPoojary, K.K.
dc.contributor.authorAbhay
dc.contributor.authorShingad, A.R.
dc.contributor.authorSowjanya, N.
dc.contributor.authorPopescu, V.
dc.contributor.authorMitroi, A.T.
dc.contributor.authorNioata, R.M.
dc.contributor.authorKiran Raj, K.M.
dc.date.accessioned2026-02-05T13:17:15Z
dc.date.issued2025
dc.description.abstractScaling Machine Learning (ML) workflows in cloud environments presents critical challenges in ensuring reproducibility, low-latency inference, infrastructure reliability, and regulatory compliance. This review addresses the lack of a comprehensive synthesis of how integrated DevOps practices and cloud-native technologies enable scalable, production-grade ML systems. We analyze the convergence of MLOps with tools such as Kubernetes, Jenkins, and Terraform, detailing their role in automating CI/CD pipelines, infrastructure provisioning, and model lifecycle management. The main highlights strategies for optimizing resource utilization, minimizing inference latency, and managing data versioning across hybrid and multi-cloud architectures (AWS, Azure, GCP). We also examine serverless computing, container orchestration, and monitoring practices to enhance scalability and governance. By categorizing challenges chronologically and evaluating emerging practices such as federated learning and security-by-design, this work bridges a key gap in existing literature. It offers a unified perspective on building reliable, reproducible, and compliant ML workflows, thereby advancing the state of scalable AI system engineering. © IEEE. 2013 IEEE.
dc.identifier.citationIEEE Access, 2025, Vol.13, , p. 148559-148594
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2025.3599281
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/28227
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.subjectautomation
dc.subjectcloud computing
dc.subjectDevOps
dc.subjectKubernetes
dc.subjectMachine learning (ML) workflows
dc.subjectMLOps
dc.subjectscalability
dc.titleA Comprehensive Review on Scaling Machine Learning Workflows Using Cloud Technologies and DevOps

Files

Collections