CONTEXT AWARE DATACENTER LOAD
BALANCER
Thesis
Submitted in partial fulfillment of the requirements for the degree of
DOCTOR OF PHILOSOPHY
by
ASHWIN KUMAR
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA,
SURATHKAL, MANGALORE - 575025
June 2020

DECLARATION
by the Ph.D. Research Scholar
I hereby declare that the Research Thesis entitled Context Aware Data-
center Load Balancer which is being submitted to the National Institute of
Technology Karnataka, Surathkal in partial fulfilment of the requirements for
the award of the Degree of Doctor of Philosophy in Computer Science and En-
gineering is a bonafide report of the research work carried out by me. The
material contained in this Research Thesis has not been submitted to any University
or Institution for the award of any degree.
(CS13P01, Ashwin Kumar)
(Register Number, Name & Signature of Research Scholar)
Department of Computer Science and Engineering
Place: NITK, Surathkal.
Date: June 29, 2020

CERTIFICATE
This is to certify that the Research Thesis entitled Context Aware Datacen-
ter Load Balancer submitted by Ashwin Kumar, (Register Number: CS13P01)
as the record of the research work carried out by him, is accepted as the Research
Thesis submission in partial fulfillment of the requirements for the award of degree
of Doctor of Philosophy.
Dr. Annappa B
Research Supervisor
(Name and Signature with Date and Seal)
Dr. Alwyn Roshan Pais
Chairman - DRPC
(Signature with Date and Seal)

Acknowledgments
In this incredible journey of my Ph.D., I had the fortune to meet and work
with many outstanding people, without whom I could never have done it. Their
encouragement has allowed me to push the envelope beyond what I initially thought
to be feasible. For this reason, I would like to mention the individuals to whom I owe
my genuine admiration and thankfulness.
I wouldn’t know what was research and its depth if I was not given an oppor-
tunity to do it. A significant share of the credit is due to my research supervisor,
Prof.Annappa who gave me a chance to work with him in the significant field of
cloud computing. There are absolutely no words how much I am grateful to my re-
search supervisor who supported me throughout my journey of ups and downs. His
constant support, patience, guidance, and supervision accorded me to focus on my
research work. His unrelenting passion for quality and soundness in research, and
his incredible professionalism, work ethic, and forgiving nature have been a defining
inspiration for my endeavors. Thank you, sir, for always leading me the right path.
I am enormously thankful to the Research Progress Assessment Committee mem-
bers Dr. Mohit P. Tahiliani and Dr. R. Madhusudhan for their insightful com-
ments, critical questions, and valuable ideas. Their continuous interest in my research
progress and their sharp and quick feedback in all matters has greatly helped me in
achieving research-related objectives.
I humbly thank Dr. Alwyn Roshan Pais, HoD, and Chairman(DRPC) and
Dr. Basavaraj Talwar, Secretary (DRPC) for helping me in research related as-
pects. I should be thankful for the support and advice received fromDr. Shashidhar
G Koolagudi.
My journey during Ph.D. wouldn’t have been exciting if I had not met seniors
Dr. Likewin Thomas, Dr. Sumith and Dr. Manoj. They did the hand holding
during my initial duration of the course, and I also share some memorable fun moments
with them.
I should be thankful to my friends Mrs. Keerthi Shetty and Mr. Vishnu
for helping with many institutional procedures and guiding me with much course-
related information. The Feedbacks and bits of advice given by them also helped me
in improving the quality of my research related documents to many folds. I want
to thank also my juniors Mrs. Saraswati and Mr. Manjunath for their help in
reaching people and resources when I am away from institution. I should also thank
Dr. Priyaranjan Sharma for sharing a room in the hostel and also being there for
me when I need something during course work.
I should be thankful from the bottom of my heart to colleagues at SonyMr.Uday
Kiran A and Mr.Prashanth P for supporting me throughout my journey of Ph.D.
The amount of trust they have in me encouraged me to further my research with
any worries. I should also thank Mr.Jaison Joseph and Dr.Kanchana Gopinath
for motivating me to take up research and their inspiring thoughts to excel in my
profession.
I want to thank my close friends Mr. Vishnu,Mr. Malleshi,Mr. Bahubali,
Mr. Rudragouda, Mr. Harish, Mr. santosh for supporting and motivating me
throughout my journey of Ph.D.
I would humbly thank the non-teaching staff of my department. Mr. Dinesh
Kamath, and Mrs. Yashawanthi were supportive in ensuring that research-related
seminars go well and uninterrupted.
I want to thank my parents, Dr. Dattatraya Kulkarni, Mrs. Maya Kulka-
rni and my wife Mrs.Priyanka for being the main pillars of this journey and for
constant and unconditional love towards me. They understood and encouraged me
in all my decisions. Without their support, this thesis would not have been possi-
ble. Without their constant encouragement, this journey would be left incomplete.
I should thank my brother Dr.Chetan for being there for me during my difficult
phases. I should also thank my little son Avnish for helping me get rid of tiredness
with his smiles. I would like to thank my aunt Mrs. Chaya Kulkarni, she has been
a friend, guide and mentor throughout my studies. With this, I would like to thank
all of my family members for all the support and encouragement I have got always.
In the end, all that remains is to thank you, dear reader. If you have found
at least a small part of this thesis useful or interesting, you have made all my work
worthwhile
Finally my inmost gratitude towards almighty for helping me get through
this!
Place: Surathkal Ashwin Kumar
Date: June 29, 2020

Abstract
The ever increasing demand for cloud adoption is prompting researchers and
engineers around the world to make the cloud more efficient and beneficial for all
the stakeholders that include cloud service providers and cloud service users. Cloud
computing will bring profits for all when the cloud resources are used efficiently,
and its services are made affordable for businesses by reducing its cost. Managing
cloud data center incurs a high cost, which includes capital expenditure for procuring
necessary IT infrastructure at the beginning and recurring operational expenditures
for data center management which includes power, manpower and maintenance. Data
center owners need to reduce the data center management cost by employing efficient
resource provisioning techniques to save energy and reduce cost without affecting the
service level agreements.
Load balancing is one of the critical aspects of cloud data centers that can sig-
nificantly improve resource utilization, performance, and save energy by properly
assigning/reassigning computing resources to the incoming requests. Therefore, how
to schedule user tasks to virtual machines and virtual machines to physical servers
effectively by considering various dynamic parameters is an evolving research problem
in cloud computing.
The proposed work investigates contextual parameters such as physical ma-
chine characteristics, data center load conditions, and electricity prices in the geo-
distributed data center locations to propose energy and cost-efficient load balancing
technique for cloud data centers. The physical machine characteristics such as perfor-
mance to power consumption profile are utilized for virtual machine placement deci-
sions in data centers to optimize total energy consumption and improve throughput.
The context of peak and non-peak load conditions is used to avoid virtual machine
i
placement optimization overheads and efficient utilization of power-efficient physical
servers. The electricity price varies according to geographical locations throughout
the globe. The electricity price, along with response times, is considered to distribute
data center loads optimally in geo-distributed data centers to save total power costs.
Proposed work also investigates current challenges for efficient graphical processing
units resource utilization in virtualized environments.
The work proposes a context-aware load balancing technique that ensures better
power-efficient resource utilization, enhances performance by avoiding overheads, and
also saves total power costs of the data centers. The experimental results indicated
that our proposed context-aware load balancer helps to save around 2-10% of power
for synthetic workloads and 1-3% for real workload traces in the data centers. The
experimental results also attested that our proposed cost-aware cloud service broker
load distribution technique for geo-distributed data centers can save around 15-23%
of power costs of the data centers.
ii
Table of Contents
Abstract i
Table of Contents iii
List of Figures vii
List of Tables ix
Abbreviations and Nomenclature xi
1 Introduction 1
1.1 Cloud computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Service models . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Deployment models . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.5 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Virtualization in cloud computing . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Characteristics of virtualization . . . . . . . . . . . . . . . . . 9
1.2.3 Benefits of virtualization . . . . . . . . . . . . . . . . . . . . . 9
1.3 Load balancing in cloud data centers . . . . . . . . . . . . . . . . . . 10
1.4 Background for research . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Research contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
iii
1.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Literature Survey 19
2.1 Overall data center management costs . . . . . . . . . . . . . . . . . 19
2.2 Power and cost optimization . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 VM placement optimization . . . . . . . . . . . . . . . . . . . 20
2.2.2 Load balancing in geo-distributed data centers . . . . . . . . . 25
2.2.3 VM level load balancing policies in CloudAnalyst . . . . . . . 27
2.3 GPU enabled computing resource management . . . . . . . . . . . . . 30
2.4 Research gaps identified . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6 Research objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 VM Placement Optimization 37
3.1 Background study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Research objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Proposed system architecture . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 Local context manager . . . . . . . . . . . . . . . . . . . . . . 41
3.3.2 Global workload scheduler . . . . . . . . . . . . . . . . . . . . 42
3.4 Power efficiency of physical machines . . . . . . . . . . . . . . . . . . 42
3.5 Load condition based adaptations . . . . . . . . . . . . . . . . . . . . 45
3.6 Proposed context-aware VM placement optimization . . . . . . . . . 47
3.6.1 VM placement optimization process . . . . . . . . . . . . . . . 48
3.6.2 VM placement algorithm(PPABFD) . . . . . . . . . . . . . . 49
3.6.3 Host underload condition . . . . . . . . . . . . . . . . . . . . . 50
3.6.4 Load context detection in datacenter . . . . . . . . . . . . . . 51
3.7 Experimental evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.7.1 Performance metrics . . . . . . . . . . . . . . . . . . . . . . . 53
3.7.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . 54
3.7.3 Experiment 1: Synthetic workload with a variable number of
VMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.7.4 Experiment 2: Real-world workload with multiple PM types . 60
iv
3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4 Electricity cost-aware load balancing in geo-distributed data centers 69
4.1 Background study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Research objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3 Electricity cost-aware cloud service broker policy . . . . . . . . . . . . 72
4.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.4.1 CloudAnalyst . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4.2 Experimental configurations . . . . . . . . . . . . . . . . . . . 76
4.5 Experimental results and analysis . . . . . . . . . . . . . . . . . . . . 77
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5 Peak hour Performance Improvement for ESCE Algorithm 83
5.1 Background study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.1.1 Task scheduling in cloud data centers . . . . . . . . . . . . . . 84
5.1.2 Equally spread current execution load algorithm(ESCE) . . . 84
5.2 Research objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3 Proposed VM load balancer . . . . . . . . . . . . . . . . . . . . . . . 86
5.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.5 Experimental results and analysis . . . . . . . . . . . . . . . . . . . . 89
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6 Load balancing in GPU enabled Cloud: Challenges and Opportuni-
ties 95
6.1 Background study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.1.1 GPUs and cloud datacenters . . . . . . . . . . . . . . . . . . . 96
6.1.2 GPU virtualization in cloud . . . . . . . . . . . . . . . . . . . 97
6.2 Research objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.3 GPU resource provisioning techniques in cloud . . . . . . . . . . . . . 99
6.4 Current challenges with GPU computing in the cloud . . . . . . . . . 102
6.4.1 Challenges with GPU resource management in cloud . . . . . 102
6.4.2 Challenges with programming vGPUs . . . . . . . . . . . . . . 105
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
v
7 Conclusions and Future Work 107
7.1 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . 107
7.2 Directions for future work . . . . . . . . . . . . . . . . . . . . . . . . 109
References 111
List of Publications 117
vi
List of Figures
1.1 Cloud Service Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Cloud Deployment Models . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Instances Of Over And Under Provisioning . . . . . . . . . . . . . . . 8
1.4 Virtualized Physical Machine . . . . . . . . . . . . . . . . . . . . . . 8
2.1 Classification Of VM Placement Schemes . . . . . . . . . . . . . . . . 21
2.2 Overview Of The Proposed Work . . . . . . . . . . . . . . . . . . . . 34
3.1 System Block Diagram For Proposed VM Placement Optimization . . 40
3.2 Local Context Manager Architecture . . . . . . . . . . . . . . . . . . 41
3.3 Global Workload Scheduler Architecture . . . . . . . . . . . . . . . . 42
3.4 Proposed Host Selection Technique For VM Placement And Host Shut-
down. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 Load Context Aware VM Placement Optimization Process. . . . . . . 47
3.6 CloudSim Layered Architecture . . . . . . . . . . . . . . . . . . . . . 56
3.7 Comparison Of Power Consumption Results For Synthetic Workload . 58
3.8 VM Migrations Results For Synthetic Workload . . . . . . . . . . . . 60
3.9 Overall SLA Violations Results For Synthetic Workload . . . . . . . . 61
3.10 Number Of Host Shutdowns For Synthetic Workload . . . . . . . . . 62
3.11 Power Consumption Results For Real Workload . . . . . . . . . . . . 63
3.12 VM Migrations Results For Real Workload . . . . . . . . . . . . . . . 65
3.13 Overall SLA Violations For Real Workload . . . . . . . . . . . . . . . 65
3.14 Number Of Host Shutdowns Reported For Real Workload . . . . . . . 66
4.1 Cloud Application Service Broker . . . . . . . . . . . . . . . . . . . . 71
4.2 Block Diagram Of CloudAnalyst . . . . . . . . . . . . . . . . . . . . . 75
vii
4.3 Request Assignment Percentage . . . . . . . . . . . . . . . . . . . . . 81
4.4 Comparison Of Power Costs . . . . . . . . . . . . . . . . . . . . . . . 82
5.1 Task Scheduling Model In Cloud . . . . . . . . . . . . . . . . . . . . 85
5.2 Call Flow Diagram For Proposed Load Balancer . . . . . . . . . . . . 88
5.3 Comparison Results For 5 VMs In DC . . . . . . . . . . . . . . . . . 90
5.4 Comparison Results For 25 VMs In DC . . . . . . . . . . . . . . . . . 91
6.1 Block Diagram Of Typical GPU And Video Card . . . . . . . . . . . 97
6.2 Typical Flow Of GPU Task Execution In An Application . . . . . . . 97
6.3 System View Of GPU Enabled Virtualized Server And User VM With
GPU And CPU Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 98
viii
List of Tables
1.1 Comparison Of Cloud Deployment Models . . . . . . . . . . . . . . . 6
1.2 Cost For Cloud Data Center Owners . . . . . . . . . . . . . . . . . . 13
2.1 Summary Of Related Past Works In VM Placement And Optimization 24
2.2 Important Past Work Related To Geo-distributed Data Center Load
Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Important Past Work Related To GPU Provisioning Policies . . . . . 31
3.1 Power And Performance Metrics From SPECPower Benchmark . . . . 46
3.2 Physical Machine Configurations In Data Center . . . . . . . . . . . . 55
3.3 Virtual Machines(VM) Configurations Used In DC . . . . . . . . . . 56
3.4 Evaluation Results For Performance Metrics For Synthetic Workload 59
3.5 Physical Machines Types Used In Experiment 2 . . . . . . . . . . . . 63
3.6 Evaluation Results Of Performance Metrics For Real World Workload 64
4.1 Electricity Cost Matrix Representation . . . . . . . . . . . . . . . . . 72
4.2 Electricity Cost Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.3 Data Center Configurations . . . . . . . . . . . . . . . . . . . . . . . 77
4.4 User Base Configurations . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5 Transmission Delay Matrix Between Regions(in msec) . . . . . . . . . 78
4.6 Proposed Service Broker Request Assignments . . . . . . . . . . . . . 79
4.7 Summary Of Power Costs For Proposed Technique . . . . . . . . . . 80
5.1 Example Of VM Allocation By ESCE Algorithm . . . . . . . . . . . . 85
5.2 User Bases: Regionwise Statistics Of Users . . . . . . . . . . . . . . . 89
5.3 Comparison Results For 5 VMs Case . . . . . . . . . . . . . . . . . . 90
ix
5.4 Comparison Results For 25 VMs Case . . . . . . . . . . . . . . . . . . 92
x
Abbreviations and Nomenclature
Abbreviations
CPU Central Processing Unit
CUDA Compute Unified Device Architecture
DC Data Center
DC Id Data Center Identifier
ESCE Equally Spread Current Execution Load
FCFS First Come, First Serve
FF First Fit Policy
FFI First Fit Increasing Policy
GPU Graphical Processing Unit
GWS Global Workload Scheduler
HPC High Performance Computing
IT Information Technology
LCM Local Context Manager
Mbps Megabits per second
MIPS Million Instructions Per Second
NAS Network Attached Storage
xi
OpenCL Open Computing Language
OS Operating System
PCIe Peripheral Component Interconnect express
PM Physical Machine
QoS Quality of Service
RAM Random Access Memory
ROI Return On Investment
SIMD Single Instruction Multiple Data
SLA Service Level Agreement
SM Symmetrical Multiprocessor
vCPU Virtualized Central Processing Unit
VDI Virtual Desktop Interface
vGPU Virtualized Graphical Processing Unit
VM Virtual Machine
VMM Virtual Machine Manager
vRAM Virtualized Random Access Memory
xii


Chapter 1
Introduction
1.1 Cloud computing
Cloud computing, a long-held dream of offering computing as a utility, has the
potential to transform the way a major portion of IT businesses and internet services
work. It is an idea that enabled software, hardware, and other services to be rented to
a large base of customers with added flexibility. Now businesses with innovative ideas
for internet services need not invest large capital for computing hardware or software
at the beginning to deploy their services to end-users. This means cloud computing
benefits small scale businesses to start operations with minimum capital in setting up
IT infrastructure and with pay-as-you-go like model.
Cloud computing(Armbrust et al., 2009) eliminates the necessity to predict the
application load in advance for provisioning computing resources. The computing
resources in the cloud are scaled as per demand to save wastage of resources by over-
provisioning and loss of business by under-provisioning. The human effort to maintain
computing hardware and software is also avoided by offloading maintenance to cloud
providers. The term cloud computing refers to both software(platform software and
application software) offered as internet service along with computing hardware in
data centers.
It is estimated that enterprises will spend 33% more on cloud services or solutions
in 2019 and also it is predicted that 80% of IT businesses will rely on the cloud in-
stead of conventional infrastructure by 2025. Cloud computing is the fastest-growing
1
market with its investments expected to cross $214 bn in 2019(Jain, 2019).
1.1.1 Definition
The standard definition of cloud computing provided by NIST(Mell and Grance,
2011) is as below;
"Cloud computing is a model for enabling ubiquitous, convenient, on-demand network
access to a shared pool of configurable computing resources (e.g., networks, servers,
storage, applications, and services) that can be rapidly provisioned and released with
minimal management effort or service provider interaction."
1.1.2 Characteristics
Cloud computing infrastructure is made up of powerful computing nodes (com-
posed of physical hardware and software entities) and large storage units which are
connected by a high-speed network. The cloud services are made available through
internet protocols for users located anywhere in the globe. The abstraction layer
called hypervisor(or VMM) sitting above the physical layer decouples user workloads
from underlying physical resources. The abstraction layer manifests the typical cloud
characteristics. The five important characteristics typical to any cloud computing
infrastructure are described briefly in this section.
1. On demand self-service - Consumers can provision resources from cloud uni-
laterally without the need for human intervention from cloud service providers.
2. Broad network access - Cloud computing capabilities are available over a net-
work and can be accessed using standard network protocols from heterogeneous
client devices.
3. Resource pooling - Cloud service providers pool resources to multiple cus-
tomers using a multi-tenant model, wherein different virtual and physical re-
sources are assigned and re-assigned dynamically as per the changing demands
2
from customers. Examples of resources are storage, memory, compute power,
and network bandwidth.
4. Elasticity - Cloud capabilities can be provisioned elastically and released au-
tomatically. The feature allows the application to scale up or down as per
current demand, often providing the user an illusion of provisioning capability
of unlimited resources.
5. Measured service - Cloud resources provisioned are monitored, controlled, and
reported providing transparency for both users and cloud providers to enable
fair costing as per terms and usage.
1.1.3 Service models
The cloud computing services are offered in three distinct service models(Mell
and Grance, 2011) to suit different customer requirements, as shown in figure 1.1.
1. Software as a Service (SaaS)
The applications deployed on the cloud are offered as service to customers.
Cloud users can access these applications using thin clients through web browsers
or software tools. Except for limited user-specific application configuration set-
tings, customers need not bother about managing or controlling application
software or underlying cloud infrastructure.
2. Platform as a Service (PaaS)
The development environment encapsulated into a software layer and is offered
as service, upon which other higher levels of service can be built by the cus-
tomers. Users have the freedom to build, configure, and run their applications
making use of the abstracted APIs provided by the platform. Customers need
not worry about managing the software platform offered as service by cloud
providers.
3. Infrastructure as a Service (IaaS)
In this model, fundamental resources such as compute power, memory, stor-
age, and network are provisioned to customers as a service. The customer can
3
typically deploy his or her software on the provisioned infrastructure to build
and offer any application services to their clients. Such provisioned resources
can be accessed through a simple command-line tool or a lightweight user in-
terface. Customers need not be bothered about managing the underlying cloud
infrastructure offered to them.
Figure 1.1: Cloud Service Models
With IaaS used to host, PaaS used to build, and SaaS used to consume, three of these
cloud computing models enable networked access to a pool of shared configurable
resources such as servers, networks, storage, applications and services on demand.
1.1.4 Deployment models
The cloud infrastructure is set up and accessed using one of the four deployment
models(Armbrust et al., 2009) as per the business needs. The cloud deployment model
represents a specific genre of cloud environment, distinguished primarily by size, type
of ownership, and access. The four popular deployment models are explained below.
1. Public clouds
The public clouds are publicly accessible cloud environment hosted by a third
party cloud providers. The cloud services are offered on demand for a defined
cost. Here cloud providers are responsible for the creation and management
of public clouds and its information technology(IT) resources. Public clouds
4
are characterized by elasticity and utility pricing in the provisioning of their
resources.
2. Community clouds
Community clouds are in a way similar to public clouds except that the access
to community clouds is restricted to a specific group of cloud users involved
in a shared goal. It can be owned, managed, and operated by one or more
organizations in the community or by a third party cloud provider.
3. Private clouds
The private clouds are provisioned and managed for the exclusive use of a single
organization and used by cloud consumers of different departments. The private
clouds are either owned and managed by a single organization or by a third party
cloud provider. The private cloud can exist on or off the company premises.
4. Hybrid clouds
The hybrid clouds are a composition of two or more of the deployment mod-
els governed by a set of business rules. The hybrid clouds can be complex
architectures for creation and management because of heterogeneous cloud en-
vironments and split cloud management responsibilities between public cloud
providers and private cloud owners. Hybrid clouds are used when organizations
restrict movement or storage of sensitive data into public clouds.
The cloud deployment models are shown in figure 1.2, and the differences between
each of the models(Jain, 2019) are presented in table 1.1.
Figure 1.2: Cloud Deployment Models
5
Table 1.1: Comparison Of Cloud Deployment Models
Parameter Private Public Community Hybrid
Data security and
privacy High Low
Comparatively
High High
Scalability and
Flexibility High High Fixed Capacity High
Ease of setup and Requires IT ex- Requires IT ex- Requires IT ex-
use pertise Easy pertise pertise
Reliability High Vulnerable ComparativelyHigh High
Cost is shared Cheaper than pri-
Cost Effectiveness Most Expensive Cheapest among commu- vate but costlier
nity than public
1.1.5 Advantages
Some of the major benefits associated with using cloud computing services for
businesses are mentioned below,
1. Investment cost
The cloud computing services free businesses from high capital investments for
the hardware and software at the beginning. Lower initial investments help
smaller businesses to start operations with smaller capital.
2. Availability
Cloud providers ensure round the clock reliable services to the customers by
maintaining 99.9% uptime for servers.
3. Scalable capacity
The services provided by the cloud can be scaled both upwards and downwards
as per dynamically changing resource demands. The scalability helps businesses
rapidly increase service capacity with an increase in demand and optimize costs
by reducing capacity during non-peak seasons.
4. Carbon footprint
Cloud services help organizations to reduce carbon footprints by allocating com-
puting resources that are just sufficient to meet current demands and avoiding
6
any over-provisioning.
5. Maintainability
Cloud computing exempts users from IT maintenance worries and provides sim-
plified ways to manage and control the rented services. The quality and conti-
nuity of user services are guaranteed by the SLA agreements.
1.2 Virtualization in cloud computing
Cloud computing can exist without virtualization, but it would be difficult and
inefficient. Cloud computing without virtualization can then be referred to as a situ-
ation in which computing resources, software, or platforms are delivered as a service
and on-demand over the Internet. Virtualization is a key enabler technology and a
vital factor in the success story of cloud computing. Virtualization(Vmware, 2019)
technology makes cloud infrastructure elastic, efficient and fault-tolerant.
Consider three cases shown in figure 1.3, though the peak load is accurately
determined and resource provisioning is done, there is a resource wastage in (a) case.
The cases (b) and (c) show how changing resource demands cause loss of business
through under-provisioning and also resource wastage. Virtualization helps companies
to mitigate mismatches caused by resource demands and allocation in run time.
1.2.1 Virtualization
Virtualization(Vmware, 2019) is the process of creating a software-based, or vir-
tual, representation of something, such as virtual servers, storage, and networks. Fig-
ure 1.4 shows a typical system stack of a virtualized physical host. The hypervisor(or
virtual machine manager) is a thin software layer that enables multiple virtual ma-
chines(VM), each running its own copy of the guest operating system(OS) to run
simultaneously on a single physical machine. The hypervisor provides an abstracted
hardware version to each of the running virtual machines and multiplexes underlying
hardware resources efficiently. OS running inside each virtual machine(VM) assumes
complete control of the underlying hardware and the virtualization framework through
a hypervisor layer provides this illusion to VMs. Each VM runs independently and in
7
Figure 1.3: Instances Of Over And Under Provisioning
isolation, so that run time problems in one VM do not affect other co-located VMs
on the same physical host.
Figure 1.4: Virtualized Physical Machine
Virtualization is most effective in reducing IT expenses and helps improving
resource efficiency and agility in the business operations of all scales. Virtualization
enables data center providers to manage resource demands from users with a fewer
number of physical machines to save power and reduce cost.
8
1.2.2 Characteristics of virtualization
The following are the key characteristics of virtualization(Vmware, 2019) tech-
nology.
A Partitioning
Virtualization enables multiple operating systems to run on the same physical
machine. The underlying system resources are divided between multiple virtual ma-
chines.
B Isolation
VMs run in isolation and failure of a VM have no impact on other co-located
VMs. Virtualization technology provides security and fault isolation at the hardware
level.
C Encapsulation
The state of the virtual machines can be saved to a file. It is also possible to
copy or move virtual machines as easily as moving and copying a file.
D Hardware independence
Virtual machines can be provisioned or migrated on to any physical host(server).
It helps in server consolidation and load balancing data center workload.
1.2.3 Benefits of virtualization
The following are some of the benefits of virtualizing resources in data centers.
1. Instant provisioning and on-demand scalability
2. Live migration support
3. Optimization of resource utilization
4. Server consolidation to save power and load balancing physical resources for
better response times.
9
5. Ease of maintenance and low downtime
6. Security and fault isolation
7. Simplified data center management
Virtualization enables data centers to self manage computing infrastructure in ever-
changing load conditions using dynamic load balancing techniques.
1.3 Load balancing in cloud data centers
A distributed system such as cloud data centers can be viewed as a collection
of heterogeneous computing, storage, and network resources shared between active
users. The users of such a distributed system have different goals, specific objectives,
and business-driven strategies, and their behaviors are complex to characterize. In
such a complex system, the management of hardware resources and software plat-
forms /applications is a very intricate task. The goal of load balancing is to improve
the performance and efficiency of such a distributed system through uniform and fair
distribution of the application load across available computing nodes. Load balancing
is a critical aspect in the cloud computing environment that helps to improve resource
utilization, enhances performance, and saves energy by efficiently assigning/reassign-
ing computing resources to the user workloads/requests.
A general formulation(Grosu and Chronopoulos, 2004) of the load balancing
problem is, given a large number of tasks, find the allocation of tasks to computing
infrastructure optimizing a given objective function (e.g., total execution time). Load
Balancing in the cloud is a method(Geeta and Singh, 2014) to distribute workloads
across many servers, network interfaces, hard drives, or other computing resources.
Cloud data centers are composed of large, powerful (and expensive) computing servers,
storage and are connected by the network infrastructure. These resources are associ-
ated with usual risks of hardware failures, power interruptions, and resource overloads
during high demands.
Load balancing in cloud computing differs from classical thinking on load-balancing
architecture(S.Jyothsna, 2016). Load balancing virtualized resources in cloud data
10
centers offers new opportunities and also a new set of unique challenges. Load bal-
ancing in cloud data centers is used to make sure none of your resources remain idle(or
underused) while others are being overused. To balance the load distribution, some
of the workloads may be migrated from overloaded source nodes to relatively lightly
loaded destination nodes. When resource demands are not high, the load balancing
technique may choose to power off some servers by migrating its workloads to other
nodes in the data center to save energy.
Load balancing algorithms(Nadeem and Mohammed, 2015) are broadly classified
into two types static and dynamic. When load distribution decisions are carried
out during runtime considering the current state of the system, the process is called
dynamic load balancing. If load variations are low in the systems, static load balancing
is usually employed. Static load balancing requires prior information about the system
resources to make load distribution decisions. The static load balancer does not
consider the dynamic state of the system into account for decision making.
The goals of load balancing mechanism in cloud data centers can be summarized
as below,
1. Improvement of the overall throughput substantially with optimal resource uti-
lization.
2. Save energy when the load on the data center is not high by server consolidation.
3. Backup plan in case the system fails even partially.
4. Maintain system stability by monitoring server overload conditions.
5. Accommodate run time changes in the data center’s load(demand) and resource
availability(capacity).
The load balancer technique used in the cloud data centers needs to be very sophis-
ticated and intelligent to consider various parameters to meet the given objective
function. Load balancing decisions in the cloud environment are carried out at three
different levels, as explained below.
1. Cloud broker
In geo-distributed data centers or in a multi-datacenter setup, the cloud bro-
ker is responsible for routing user requests to a particular data center(DC) for
11
processing. Load balancing at cloud broker may need to consider parameters
such as proximity of DC, network latency to DC or any other business-related
constraint.
2. VM-PM mapping process
When the user requests new virtual machines(VMs) in a data center to meet
current business demand, system creates new VMs and places them on a suit-
able physical host. Multiple VMs are placed on a single physical host to share
underlying resources. The load balancer in the DC is responsible for physical
machine(PM) selection for initial VM placement and VM migration to another
PM during host overload or server consolidation process.
3. Task-VM mapping process
When user requests arrive at the data center, it has to be assigned to a VM
for processing. The selection of a particular VM for request(task) assignment
is done by the task load balancer in DC. The task load balancer may take into
account, the parameters such as VM state(idle, busy) or number of requests
assigned, etc. for making assignment decisions.
1.4 Background for research
Cloud computing is growing at an overwhelming rate, with many internet-based
applications being migrated to cloud data centers at an ever-increasing pace. The
companies like Amazon, Microsoft, Google are expanding their cloud data centers for
the services to their vast spread user bases across the globe. The setting up of cloud
data centers need lot of investments at the beginning for IT hardware and software
along with few non-IT expenses and later incur ongoing costs such as data center
administration costs and huge power costs to keep the data center up for the 24x7
operations.
Table 1.2 lists the share of the costs of various components used in building cloud
data centers. The cost is amortized to obtain a common cost run-rate metric that
can be used for one-time costs(for purchase of servers etc.) and ongoing maintenance
expenses(for power costs). Though these cost shares may vary slightly with time and
12
Table 1.2: Cost For Cloud Data Center Owners
Cost share in % Component
(Amortized) type sub-components
45% Servers Physical resource such as CPU,Memory and storage
25% Infrastructure Power distribution lines and cooling
15% Power Consump-tion Electrical utility costs
15% Network Links, transits and other equipment
geographical positions, these are the overall major costs involved for data center own-
ers. It can be noted from the table 1.2 that power consumption costs also contribute
to a significant share in the overall data center management or ongoing costs and any
saving in power costs can help reduce significant cost for data center owners in the
long run.
It is noted that 59% of the total power consumption in the data center is at-
tributed to servers and even small amount of decrease in power consumption of servers
will certainly have the largest impact in total power costs of data centers. Addition-
ally, it may save cooling costs.
According to the United States data center energy usage report(Berkeley, 2016),
in the year 2014 alone data centers in the U.S. consumed an estimated 70 billion kWh
which was equal to about 1.8% of total U.S. electricity consumption. The electricity
usage by U.S. data centers is expected to reach 73 billion KWh in 2020.
Electricity prices vary from one geographical location to another. The electricity
price depends on several factors and governed by the domestics rules of each geo-
graphic location. The factors that may have an impact on electricity price may be
the technology employed, raw materials used and output volume involved in the gen-
eration of the electricity. It can be noted that various cloud providers are building
data centers at geographically dispersed locations across globe to ensure availability
and performance for their user applications.
13
It is vital for cloud providers to reduce data center management costs to offer
competitive pricing for users. Power costs are one of the significant portions of the
overall data center management costs, and it is going to be beneficial for both cloud
providers and users to optimize the cost of power consumption without violating
service level agreements of its customers.
1.5 Motivation
The following are the most important facts and observations that compelled us
to explore our curiosity in this direction.
1. Power efficiencies of heterogeneous physical servers
The data center is a server farm consisting of a large number of heterogeneous
physical machines connected by a high speed shared network. These physical
machines often tend to vary in terms of their computing capacity, composition,
and also in their power consumption characteristics at different load conditions.
Such heterogeneity in the composition of physical machines results in some of
these machines being more power-efficient than others during their operation in
the data center. It is feasible to optimize power consumption in the data cen-
ter by detecting and efficiently scheduling workloads to power-efficient physical
servers.
2. Non-uniform electricity costs across geographical locations
Electricity(power) price varies with geographical locations across the globe, and
many cloud providers are setting up data centers at multiple geographical lo-
cations to cater to their users. It is possible to optimize the cost of serving
each request by geo-distributed data center network by routing requests to the
cost-effective yet quickest data center among many geo-distributed data centers
available at that time.
3. Non-uniform load conditions(peak and non-peak) in data centers
The data centers experience varying load conditions at different times of the day.
One of the goals of the load balancer in virtualized environments like cloud data
centers is to adjust the workloads to available physical resources(VM placement
14
optimization) as per changing load conditions to enhance resource utilization
and also to improve performance. It is usually done considering the load condi-
tions at each physical server(overload or underload). It is possible to improve
the load balancing algorithms to consider global(intra-DC) load conditions(load
context) to make optimal workload placement decisions.
4. Increase in demand for supporting efficient GPU computing in cloud
Many cloud providers have begun offering GPU-enabled services for their cus-
tomer applications where GPUs are essential or when high computational power
is needed to meet the desired QoS. Though virtualization solutions for CPU are
matured well to use in data centers, the same conventional virtualization tech-
niques do not apply for GPUs because of the inherent differences in architecture
and operations. There is a need to study various existing issues with GPU
enabled VM provisioning, replacement, and power optimizations from the per-
spectives of resource management and also investigate difficulties posed by ap-
plication developers to design their algorithm for efficiently utilizing virtualized
GPUs(vGPU) in the cloud.
1.6 Research contributions
The following contributions of this research work are available to the research
community in the form of journal and conference publications.
• Information of contextual parameters for power and cost-saving in
cloud environment: It provides information about various parameters that
can constitute the context of cloud DCs including physical machine power and
performance characteristics in heterogeneous DCs, varying electricity costs in
multi-DC set-up across the globe, and dynamic load conditions in DCs.
• Framework for detection of context in DCs: The context of the DC is
classified as the local and global context. The detection techniques of the local(at
each host) and global(overall load) contexts in DC are proposed.
• Physical machine characteristics and load conditions aware VM place-
ment optimization: A new VM placement optimization technique considering
15
power and performance characteristics of each physical host and overall load
condition in the data centers is proposed.
• Electricity cost-aware request routing technique in geo-distributed
DCs: A cost optimizing request routing cloud broker technique considering
varying electricity costs across the globe in the geo-distributed multi data center
setup is proposed.
• Peak hour performance improvement of task load balancer(ESCE):
Modification to the existing ESCE algorithm is proposed to improve the peak
hour processing efficiency. The proposed method overcomes the over-allocation
problem in the algorithm to manage uniform allocations in the current state of
the system.
• Identifying research challenges for efficient GPU computing in the
cloud: Existing research challenges concerning resource management and pro-
gramming for GPUs in virtualized environments is discussed.
1.7 Outline of the thesis
This section briefly describes each chapter of this thesis, to give a brief overview
of the structure.
• Chapter 1, the current chapter introduces the general domain of cloud comput-
ing and motivates the need for new load balancing techniques that are capable
of considering contextual parameters for cost and energy saving for data center
owners.
• Chapter 2 describes the literature survey related to the problems and past
solutions for the objectives addressed in this thesis.
• Chapter 3 explains the proposed context-aware VM placement optimization
technique for power saving in cloud data centers.
16
• Chapter 4 describes our proposed cost-aware request routing technique in a
geo-distributed data center scenario.
• Chapter 5 explains our proposal for peak hour performance improvement for
Equally Spread Current Execution (ESCE) load balancing algorithm, a task to
VM load balancer used in data centers.
• Chapter 6 mentions our study of current infrastructure for supporting GPU
computing in cloud data centers and existing challenges concerning resource
provisioning and programming for virtual GPUs in a cloud setup.
• Chapter 7 concludes the thesis and summarizes the contributions in more
detail. Furthermore, the possible directions in which the proposed methods and
techniques that can be improved further are also briefly discussed.
1.8 Summary
The chapter covered the introduction to the concepts of cloud computing, vir-
tualization, and load balancing briefly. Then, the chapter presented the background
for the proposed research, motivation and contributions of the reported work. The
chapter concluded with the outline of the thesis.
In the next chapter, the important literature that is relevant to the research
problem addressed in this thesis is discussed along with the research gaps identified,
problem definition, and research objectives.
17
18
Chapter 2
Literature Survey
In the previous chapter, we have introduced concepts of cloud computing, virtu-
alization, and load balancing in cloud environments. This chapter presents the set of
scientific literature we have referred to in this thesis. There are several contributions
done by researchers all over the world that have helped us in identifying the research
gaps and addressing them by proposing suitable solutions. The problem of reducing
the data center management costs is addressed in this thesis.
2.1 Overall data center management costs
The setting up of data center incurs huge capital investment at the beginning
and later cloud providers have to pay on-going operational expenses for power bills
and other maintenance tasks at regular intervals. It is noted by a study(Hamilton,
2019) that power consumption cost is the second most contributor to the overall data
center management costs after servers. It is also estimated that power costs are going
to dominate in maintenance costs of large scale modern data centers in the future.
A study(Greenberg et al., 2009) conducted for the cost estimation of cloud service
data centers observed that power costs contribute a share of about 15% in the overall
data center management costs and 59% of power consumption costs are attributed to
the power consumed by the data center servers. It is noted that any decrease in the
power consumption of servers will have the largest impact on the overall data center
power costs. Also any decrease in the power consumption by the data center servers
will lead to reduced cooling costs.
19
Clearly, the data center management costs can be significantly reduced if power
costs are optimized in the data centers. Any reduction in data center management
costs will increase the return on investment(ROI) for data center owners and it will
benefit both cloud providers and also, in turn, can reduce costs for the cloud services
for users.
2.2 Power and cost optimization
The power and operation cost optimization in cloud data centers is an evolving
research area considering the problem of stochastic nature. The overall power cost
optimization can be addressed by any one or all of the following techniques.
1. Optimize the overall power consumption of servers.
2. Utilize resources from data centers where power is relatively cheap.
3. Improve resource utilization and throughput to avoid resource wastage in the
data center.
Many important techniques have been proposed to optimize the cost of data center
management by improving resource utilization and exploiting power-saving opportu-
nities. This section discusses some of the noted past works that have motivated our
research.
2.2.1 VM placement optimization
Although the notion of VMs and virtualization has been a game-changer for the
IT industry, the VM placement brings many challenges that need to be addressed in
cloud computing(Abdelsamea et al., 2014). VM placement needs to be optimal to
meet performance goals, optimize network usage, reduce resource costs, and also save
energy. The VM placement optimization strategy can be QoS-aware, power-aware,
cost-aware, network-aware, GPU-aware or a combination of these.
The VM placement schemes can be broadly classified into two types(Masdari
et al., 2016) as shown in figure 2.1.
20
1. Static VM placement: The mapping between VMs and PMs are fixed through-
out the lifetime of the VMs based on the application-specific requirement. The
VM-PM mappings may not undergo changes for long time. Static VM place-
ment will not involve VM migrations. The static VM placement is generally not
power efficient as they do not adapt to changing conditions in the data center.
2. Dynamic VM placement: The initial mapping of VM and PM is changed
based on the state changes in the load of the system.
The dynamic VM placement schemes can be further classified into two types based
on when the VM placement is initiated.
1. Proactive VM placement: The initial mapping of VM to PM is changed
before the system reaches a certain condition.
2. Reactive VM placement: The initial mapping of VM to PM is changed after
the system reaches a certain condition. The change in mapping may be induced
by several factors such as performance, maintenance, power, or load situations.
Figure 2.1: Classification Of VM Placement Schemes
We are not interested in static VM placements in our reported work as they do not
help to save power in ever-changing load conditions in the data center. The reported
work focuses on the objective of power cost minimization by power saving through
dynamic VM placements and VM placement optimizations.
21
A Power and cost-saving in data center
Some of the noted past work that deals with power consumption minimization
and energy saving are discussed in this section.
An adaptive heuristics-based performance efficient and energy-saving technique
(Beloglazov and Buyya, 2012) for dynamic consolidation of VMs in cloud data centers
is proposed. The authors presented a competitive analysis and proved competitive
ratios of optimal online deterministic algorithms. The authors addressed the problems
of VM migration and dynamic VM consolidation. Paper proposed a novel solution
for dynamic consolidation of VMs based on the analysis of historical data from the
resource usage by VMs and power consumption statistics of the host machines to
arrive at the VM placement decisions.
A novel technique(Chiang et al., 2014) to utilize server idle power in the data cen-
ter to minimize operational costs is proposed. The authors first studied the problem
of controlling service rates and optimizing the operational cost of data centers. The
authors then formulated a three-parameter cost function that takes into account the
costs of power consumption, system congestion, and server startup. A green control
algorithm was proposed to solve the constrained optimization problem of cost-saving
and to make costs versus performances tradeoffs in physical machines with different
power-saving policies without violating the performance SLAs promised to users.
A performance interference aware virtual machine placement strategy(Moreno
et al., 2013) to avoid performance bottlenecks caused by non-compatible VMs co-
hosted on the same servers is proposed. The paper proposes a novel technique for
workload allocation for energy efficiency by considering the VM workload character-
istics and host internal interference levels to select the suitable physical host for the
given workload.
A technique(Guo and Fang, 2013) to utilize energy storage available in data
centers to reduce the overall electricity costs in the wholesale electricity markets is
proposed. The authors considered the scenario where the price of electricity varies
both spatially and temporally. The technique proposed integrates center-level load
balancing with the server-level configuration, and battery management and also at the
same time ensures the quality-of-service(QoS) for users. The paper utilizes Lyapunov
22
optimization to achieve a tradeoff between energy storage and cost-saving.
Energy and SLA-aware VM placement strategy(Mosa and Paton, 2016), which
dynamically assigns virtual machines to physical servers in a cloud environment is
proposed. The authors formulated the VM placement problem using utility functions
and proposed a genetic algorithm to search VM-PM assignments that maximize the
utility function formulated for the VM placement problem. The technique proposed
co-optimizes SLA violations and power consumption.
A evolutionary game theory based VM placement and optimization technique(Xiao
et al., 2014) for dynamic VM placement and server consolidations in the data center is
proposed. The proposed work addressed the challenges with VM placement for energy
saving by building a computational model for energy consumption in data center.
An energy-aware scheme for VM placement optimization is proposed for power
consumption reduction and improving load balance in the data centers. A technique
based on genetic algorithm and tabu search algorithm called GATA(Zhao et al., 2019)
is proposed. The goal of the proposed technique is to achieve optimal VM placements
and energy saving in the data centers.
A variant of Particle swarm optimization (PSO)(Dashti and Rahmani, 2015) to
address the problem of incompatibility between user requests and physical machine
specification causing the performance degradation and power wastage in data centers
is proposed. A modified PSO algorithm is proposed to migrate the VMs from the
overloaded hosts and also a dynamic server consolidation technique to save power
is presented. They demonstrated that the proposed solution can reduce power con-
sumption and improve performance.
The virtual machine placement problem with the goal of minimizing the power
consumption in the data center is addressed using the heuristics-based approach(Li
et al., 2013). Authors studied the wastage of resources in the physical machines
due to imbalance created in utilization of multi-dimensional resources of the host
machines. Authors proposed a multi-dimensional space partition model called EAGLE
to overcome the imbalance in resource utilization and reduce power consumption in
the data center.
A profit-maximizing technique(Toosi et al., 2014) for cloud service providers
by optimizing the allocation of data center capacity to each pricing plan utilizing
23
Table 2.1: Summary Of Related Past Works In VM Placement And Optimization
Serial Primary Mech- Goal of proposed
No anism Authors,Year work Limitations
1 Adaptive Heuris- Anton and Rajku-
Minimizing total energy Does not consider Per-
tics mar,2012 consumption of data- formance characteristicscenter. of physical hosts.
Yi-Ju Chiang et. Optimizes operational Technique considers2 Green control al,2015 cost of datacenter and only idle power in DCensures SLA guarantee. to save cost.
3 Dynamic pro- Adel Nadjaran Maximizing profit for Does not consider en-gramming Toosi et.al,2015 data center owners. ergy saving.
Seyed Ebrahim
4 PSO based Dashti and Amir
Minimizing energy con- Technique does not con-
Masoud Rah- sumption and ensures sider power efficiency of
mani,2015 QoS for users. PMs.
Does not consider power
5 Heuristics based Li, X.et. al,2013 Minimizing total energyconsumption. efficiency of PMs anddoes not guarantee QoS.
Noumankhan Minimizing perfor-
6 Best fit decreasing Sayeedkhan, P. mance degradation due Does not consider en-
and S. Balaji,2014 to interference. ergy saving.
7 Graph theory Xiao, Z., et al., Minimizing energy con- Technique is not powerbased 2015 sumption. and Qos aware.
8 ACO based Dong, J.-k., et Reduce communication Technique is not power-al.,2014 traffic in DC network. aware.
9 Greedy algorithm Kanagavelu, R., Reduces inter-VM traf- Technique does not ad-based et al.,2014 fic and network load. dress energy saving.
10 Integer program-
Li, W., J. Tords-
son, and E. Elm- Ensures QoS for users. Technique does not ad-ming roth,2012 dress power saving.
Maximize resource uti- Technique does not con-
11 Automata-based Liu, C., et al,2014 lization and minimize sider power efficiency of
communication traffic. PMs.
Lyapunov Opti- Yuanxiong Guo Minimizing power costs Technique does not ad-12 mization and Yuguang in the variable pricing dress power consump-Fang,2013 market. tion reduction.
Genetic algorithm Abdelkhalik Minimizing overall cost Technique does not con-13 based et.al,2015 and SLA violations. sider power efficiency ofPMs.
Interference Ismail Solis Minimizing energy con- Technique does not con-14 aware algorithm Moreno et.al,2013 sumption and perfor- sider power efficiency ofmance aberrations. PMs and QoS.
15 Affinity aware Sujesha and Minimizing network re- The technique only con-VM placement Kulkarni, 2011 source utilization. siders network latency.
16 power-aware VMplacement Zhao et.al, 2019
Minimizing power usage Technique is not power
by host shutdown. and QoS aware.
24
admission control for resource reservations is proposed. The authors proposed an
optimization technique based on the formulation of stochastic dynamic programming
and two heuristics that consider trade-offs between computational complexity and
optimality. The proposed technique is evaluated using real workload traces of Google
to prove the effectiveness of the solution.
The problem of performance degradation due to resource contention with disk i/o
when two or more disk intensive VMs are co-hosted on a physical server is discussed.
Authors(Sayeedkhan et al., 2014) proposed a best fit decreasing(BFD) allocation tech-
nique based on the static disk threshold-based migration scheme for disk-intensive task
scheduling in a cloud computing environment to overcome the problem.
Some of the past works also attempted to solve VM placement optimization
for network traffic minimization in the data center using techniques such as Ant
colony optimization(Dong et al., 2014), network affinity aware scheme(Sudevalayam
and Kulkarni, 2011) and greedy based schemes(Kanagavelu et al., 2014). The VM
placement optimization problem is also addressed for ensuring QoS for users at all
times by using Integer programming(Li et al., 2011) technique and to also meet hybrid
objectives such as maximizing resource utilization and reduce communication traffic
using automata-based schemes(Liu et al., 2014).
The problem of VM placement optimization has been addressed in the past
using different approaches/algorithms to achieve different desired objectives as dis-
cussed above. Table 2.1 summarizes these important related works with their primary
mechanism and goals achieved by each one of them.
2.2.2 Load balancing in geo-distributed data centers
Many cloud providers are setting up geographically dispersed data centers to
cater to increased computing demands from user applications and also reduce re-
sponse times. When multiple DCs are serving user requests, it is vital to determine
which DC and which PM to assign to fulfill the request for computation. It is also
important to meet additional constraints like minimum cost, optimal power, etc. We
have investigated the issue of load distribution among available geographically dis-
tributed data centers considering the operational expenses involved. Some of the
25
noted literature that is relevant to our study are discussed in this section.
A study(Ashikur et al., 2014) of power management problem of data center
operations and various aspects that influence the power costs is reported. The authors
discussed the current state of art technologies and proposed methods to improve the
power management in the data centers. The paper also proposes to utilize smart grid
environment to ensure efficient and dynamic power management solution for the data
centers.
A priority-based round-robin(Mishra et al., 2014) is proposed to schedule the
requests from the user bases to the data center when there are multiple data centers
are available in the same region. The data centers are assigned a priority and requests
are assigned based on round-robin strategy to improve the performance compared to
proximity-based routing service broker algorithm(Wickremasinghe et al., 2010).
A DVFS based operational cost optimization solution(Gu et al., 2015) is proposed
for the geo-distributed data center scenario. The proposed technique exploits the
dynamic frequency scaling technique for power consumption management and an
optimization problem is formulated and solved that reduces the operational expenses
of the data center without affecting the quality-of-service for the user tasks.
A game theory based algorithm(Tripathi et al., 2017) for load balancing is pro-
posed to optimize the operating cost in the geo-distributed data centers. Authors
modeled the load balancing problem as a non-cooperative game and operating expen-
ditures are modeled as a linear combination of power and latency costs. The proposed
technique models the load balancing as a cost optimization game and obtains a nash
equilibrium structure. Based on the obtained structure a novel algorithm is proposed
to minimize operating expenses.
The cloud service broker is responsible for routing requests from users to one
of the cloud data centers in the geographically dispersed data centers. A proximity-
based request routing technique(Wickremasinghe et al., 2010) is proposed that routes
users to the nearest available data center in terms of transmission delay. The authors
also proposed a best response time service routing policy that estimates the response
times for all the available data centers for the current request and DC with smallest
estimated response time is allocated for the user request.
A framework(Nadjaran Toosi et al., 2017) for reactive load balancing to distribute
26
requests for web application among multiple available data centers is proposed. The
load balancing algorithm routes the user requests based on the renewable energy
source available in the location of the data centers. The authors suggest that the
proposed technique can reduce power costs by reduced utilization of brown energy.
A response time-sensitive load balancing solution is proposed for distributed,
heterogeneous data centers scenario. The offline solution is proposed based on force-
directed scheduling technique(Goudarzi and Pedram, 2013) that can determine the
application placement on a particular DC over a long period of time. The offline
algorithm is further extended to support online application placement in a distributed
DC with migrations. A prediction about application lifetimes, workload volumes,
renewable energy sources are considered for decision making.
The authors proposed a fuzzy-based algorithm(Toosi and Buyya, 2015) to exploit
the temporal variations of power costs, renewable energy available to reduce power
costs and increase utilization of renewable energy. The proposed algorithm is tested
with real workload traces of National Renewable Energy Laboratory and Energy In-
formation Administration and found to improve the reduction in cost to a significant
extent.
2.2.3 VM level load balancing policies in CloudAnalyst
The scheduling of user requests in the cloud data centers is an NP-hard opti-
mization problem. Load balancing of tasks on VMs is an important aspect in cloud
computing to meet several objectives like uniform utilization, power and cost-saving.
Effective load balancing strategies can avoid conditions like overload, underload of
VM resources causing system failures or wastage of power. There is lot of literature is
available for load balancing on VM in cloud computing domain, we will discuss some
of these algorithms which are relevant to our work.
CloudAnalyst(Wickremasinghe et al., 2010) is an open-source, graphical user
interface(GUI) based simulator for the cloud environment. The CloudAnalyst offers
simulation and modeling of all important entities in cloud and offers flexibility to add
and evaluate a new resource provisioning policy in cloud before being deployed on to
real cloud. The CloudAnalyst provides 3 different VM level load balancing strategies
27
Table 2.2: Important Past Work Related To Geo-distributed Data Center Load
Balancing
Authors,year Primary Problem Ad-Mechanism dressed Limitations
(Wickremasinghe Proximity based Distribution of load
Dynamic electricity
et al., 2010) based on DC location pricing is not used forrequest routing.
(Nadjaran Toosi Renewable en- Reduce power cost Response times for
et al., 2017) ergy utilization data centers through users is not consid-renewable energy ered.
Advance energy
procurement in To reduce power pro- Technique does not(Le et al., 2017) multi-timescale consider response time
electricity mar- curement costs for users.
ket
(Wickremasinghe Response time To improve response Dynamic electric-
et al., 2010) based time for users ity pricing is notconsidered.
(Goudarzi and Force-directed To improve response Dynamic electric-
Pedram, 2013) scheduling time for online service ity pricing is notapplications considered.
Operational cost min- Technique does not
(Gu et al., 2015) DVFS based imization but ensure consider electricity
QoS cost for processing.
(Toosi and Fuzzy logic- Reduce power cost Response times for
Buyya, 2015) based and carbon footprint users is not consid-ered.
To minimize the oper-
(Tripathi et al., Game theory ating cost and obtain Work does not con-
2017) based the structure of Nash sider dynamic electric-
equilibrium ity cost and QoS.
To address request
(Mishra et al., Priority-based routing in multi-DC Technique does not
2014) round-robin situation in same consider electricity
region cost for routing.
(Ashikur et al., Global load bal- Power and cost man- Technique does not
2014) ancing technique agement in the smart consider response timegrid environment for users.
28
for users. A round-robin policy allocates user requests to available VMs in a circular
fashion. The algorithm starts request allocation with a random VM in the data center
at the beginning. The round-robin load balancer has a simple implementation with
less computational overhead. However, the round-robin policy does not consider the
current load on the VM for allocation.
Throttled load balancing policy considers the state of the VMs to assign new
requests from users. A VM is associated with two states idle and busy, when a new
request arrives at the data center, an idle VM is searched for allocation, if VM with
the idle state is found, the request is assigned. If none of the VMs are idle, the request
is moved to the waiting queue. Though throttled load balancing policy considers the
state of the VM, the requests may need to wait for long time in the single waiting
queue.
Equally spread current execution load balancing policy(ESCE) offers a minimum
waiting time for the requests by allocating a VM with the least number of assigned
requests/tasks. The ESCE ensures uniform request allocation to the VMs in the
data center. The ESCE load balancer maintains an allocation table to keep track
of requests and state of the allocation table is updated with notifications from data
center controller about request allocations and de-allocations to VMs. However the
ESCE load balancer does not ensure uniform request allocations to VM when request
frequency is very high(peak load situation). Our proposed work offers a solution to
the problem of non-uniform request allocation during peak load conditions for ESCE
load balancer in this thesis.
A detailed analysis of contemporary VM load balancing algorithms in Cloud-
Analyst is presented. Further a Weighted Signature-based load balancing (WSLB)
algorithm(Ajit and Vidya, 2013) is proposed to reduce response time for the requests.
WSLB calculates the load assignment factor for each host and assigns the VMs based
on the factor value.
A comprehensive survey of important VM level load balancing algorithms is
discussed in (Mishra et al., 2018). Authors present a taxonomy of load balancing
schemes and cover most of the important work done in the domain of VM level task
scheduling in the cloud. An evaluation of the heuristic-based algorithms for some of
the vital performance metrics is carried out using Cloudsim(Calheiros et al., 2011)
29
and a systematic comparative study of evaluation results is presented.
2.3 GPU enabled computing resource management
The GPU computing in the cloud is an emerging trend as more and more
compute-intensive, HPC, graphics applications are hosted on the cloud datacenters.
Though enough research has been done on CPU virtualization and their efficient
resource management techniques, the GPU virtualization and issues with the man-
agement of GPU resources in the cloud is still a growing research area. In this section,
we would mention some noted past works in GPU provisioning that are in line with
our research direction.
A disengaged scheduling technique(Menychtas et al., 2014) for the provisioning
of GPU to vGPUs is proposed. The authors utilize disengaged timeslice with an
overuse control mechanism that ensures fairness in the allocation and disengaged fair
queuing is used to limit resource idle states, but the method used is probabilistic.
Schedulers ensure a fair share of GPU among all application even when applications
are non-cooperative and adverse to each other.
A GVim(Gupta et al., 2009) scheme is proposed that utilizes both round-robin(RR)
and Xeno credit-based scheduling(XC) techniques of the Xen hypervisor for task
scheduling on GPU. RR scheduling sequentially selects a vGPU for every fixed times-
lice and monitors the call buffer of the vGPUs during this period. XC uses a credit
concept, which is time allocated for each vGPU. XC processes call buffer of vGPU for
a variable time, which is proportional to the credit amount to ensure weighted fair
sharing between guest vGPUs.
A Rain(Sengupta et al., 2013) framework is proposed for load balancing GPU
requests across GPUs fitted on distributed machines. The work suggests a two-level
hierarchical scheduling policy. The top-level module of the framework distributes the
load across all GPU equipped server machines. The bottom level module is responsible
for GPU device level scheduling of vGPUs.
A GPUvm(Suzuki et al., 2014) scheme that uses a BAND scheduler and solves
the issue with a credit-based scheduling scheme is proposed. The proposed technique
solves the miscalculation of credit when GPU idle time is included in credit amount,
30
Table 2.3: Important Past Work Related To GPU Provisioning Policies
Authors, year GPU provisioningpolicy Limitation
Menychtas et al. Fair Queueing and Framework does not consider
2014 Round robin GPU memory transfers.
Gupta et al. 2009 Round Robin and Technique includes GPU idleCredit-based time for credit calculation.
Sengupta et al. Priority-based and Framework does not support
2013 Credit-based heterogeneous GPUs.
Technique induce unnecessary
Suzuki et al. 2014 Credit-based context switches due to credit
value.
Farooqui et al. Technique cannot be applied
2016 Affinity-based to applications with device-specific codes.
Gupta et al. 2011 FCFS Technique does not addressvirtual environments.
Framework is specific to a
Zhang et al. 2014 SLA based mixture of time-constrained
applications.
Siavashi and Technique does not consider
Momtazpour, Fair-share based memory transfer during a con-
2018 text switch.
which may lead to inappropriate GPU share for certain vGPUs. GPUvm solves this
issue by first transforming the CPU time of GPU scheduler into credit value and then
subtracts the total credit value from the current vGPU.
Investigation(Farooqui et al., 2016) of current work-stealing algorithms is con-
ducted and observations are reported. Existing algorithms are found to be unaware of
the CPU and GPU characteristics, and such a situation results in degradation of per-
formance in OpenCL like applications that are capable of running on both CPU and
GPU platforms. To overcome this issue, the authors proposed a framework named
Libra, which first derives the device affinity scores for applications. Application is
assigned to the device with the highest affinity score.
31
A new Pegasus(Gupta et al., 2011) framework that addressed one of the chal-
lenges in GPU scheduling is proposed. GPU virtualization technique has no access to
impose scheduling policy because the multiplexing of GPU is integrated into the device
drivers. Pegasus proposed a concept called VCPU with which GPUs are made basic
scheduling entities. The Pegasus includes proportional fair share, FCFS, credit-based
scheme, and SLA feedback based schedulers. The objective of Pegasus is to meet the
different requirements set forth by applications using different GPU schedulers.
A framework VGASA(Zhang et al., 2014) including adaptive scheduling poli-
cies is proposed. These adaptive algorithms include a dynamic feedback control
loop. VGASA consists of three scheduling policies, SLA-aware algorithm receives
FPS(frame per second) information and adjusts the sleep time per frame time. Fair
SLA-aware algorithm take away GPU from fast running applications and allocates to
slow running ones, and enhanced SLA-aware algorithm allows all VMs to possess the
same frame rate under 100% GPU utilization.
A fair-share GPU provisioning policy is proposed by GPUCloudSim(Siavashi and
Momtazpour, 2018) to share physical GPU among multiple vGPUs. The technique
allows all competing vGPUs to receive a slice of time on GPU. If the overall processing
power of co-located vGPUs exceed that of physical GPU, then the processing power
of vGPUs is scaled.
2.4 Research gaps identified
After the study of past work in the domain of resource management in cloud
computing, we have found following research gaps, and an honest attempt is made to
address these research gaps in our reported work in this thesis.
1. Consideration of performance to power ratio of physical machines for
power saving in DC
Though Power consumption profile is considered for physical machines in past
work, performance to power ratio is the most appropriate indicator of power
efficiency of physical machines. The performance to power ratio calculated from
the SPECPower benchmark(SPEC, 2011), an industry-standard benchmark is
considered to denote the power efficiency of PMs in DC. The optimal utilization
32
of power-efficient machines is proposed in our reported work to save power in
data centers.
2. DC load conditions in the data center to improve underutilized hosts
management
The overall load conditions(context) in data centers can be considered to im-
prove the underutilized host management in DC. The DC load(peak and non-
peak) condition can be used to avoid overheads and resource wastage caused
due to host power off sequence and VM migrations during peak load conditions
in DC.
3. Response times and electricity price for power cost optimization in
geo-distributed data centers
Some of the past literature proposed solutions for renewable energy usage, elec-
tricity procurement in non-peak price duration, etc to reduce the power costs for
the data center owners. The varying electricity price across geographical loca-
tions is also suggested for request routing but estimated response time from the
data center to the user request is a vital parameter to minimize SLA violations.
4. Problem with ESCE algorithm during Peak load situation in DC
A performance problem regarding uniform VM utilization for ESCE load bal-
ancer is observed when the request frequency is high in the data center. The
state information related to request allocations to each VM is incorrectly up-
dated and used in peak load conditions causing non-uniformity in user task
allocation to available VMs in DC.
5. Scope for further investigation of efficient resource management and
programming challenges for GPU computing in cloud
Conventional techniques of virtualization do not hold good for GPUs because
of the inherent differences in terms of architectures, driver software, and dis-
tributed program/memory models. These differences make GPU provisioning
in the virtualized environment more complex and can cause inefficiency in re-
source utilization. There is scope for further investigation of underlying resource
challenges for efficient GPU processing in the cloud.
33
2.5 Problem statement
Design a context-aware load balancing strategy for the cloud to opti-
mize energy consumption/cost, performance and resource utilization using
physical machine, cost and load characteristics.
2.6 Research objectives
Our research work attempts for power consumption and cost optimization based
on contextual parameters such as physical machine characteristics, data center load
conditions, and electricity pricing at that point in time. Our proposed work also pro-
poses peak hour performance improvement for data centers by an additional modifica-
tion to existing solutions and investigates efficient GPU enabled computing problems
in cloud from resource management perspective.
Figure 2.2: Overview Of The Proposed Work
34
The overview of the proposed work is presented in figure 2.2.
1. Power and performance characteristics aware energy saving
Analyze the power consumption Vs system throughput ratio for the physical
machines in the data center to prioritize PMs for VM placements and also to
switch-off machines during non-peak hours.
2. Electricity cost-aware request routing
Analyze electricity cost in various geographical locations and response time for
routing of user requests/tasks in the multi-datacenter scenario for power cost
savings.
3. Peak hour performance improvement
Detecting peak hours, non-peak hours in data centers, and suitably change
the goal of load balancing to match the current situation. Also to propose
modifications to existing algorithms to improve their performance during high
load situations.
4. GPU enabled computing in cloud
Investigate current gaps in resource management policies and programming with
respect to virtualized GPUs.
2.7 Summary
The chapter presented the literature review for the problem of data center man-
agement cost minimization. The chapter discussed the details of the overall data
center management costs, the impact of power consumption cost on the operating
expenses is investigated. Then a literature review involving some of the relevant
past work for VM placement optimization, load balancing geologically dispersed data
centers setup, task-level load balancing algorithms in CloudAnalyst, and GPU pro-
visioning in cloud are discussed. Finally, the chapter presented the research gaps
identified, problem definition and research objectives addressed in this thesis.
In the next chapter, a novel context-aware VM placement optimization technique
for heterogeneous cloud data centers is proposed with an objective of power-saving.
35
36
Chapter 3
VM Placement Optimization
The rapid expansion of cloud adoption by businesses of all scales has created
the necessity of making the cloud more efficient and beneficial for both cloud service
providers and their clients. Managing a cloud data center incurs a huge capital at the
beginning and also a high maintenance cost for keeping it running at all times. The
power cost forms a major share in the maintenance cost and any reduction in power
usage will benefit to a great extent to cloud data center owners in the long run.
It is noted that 59% of total power consumption of the data center is attributed
to the power consumed by computing servers(Greenberg et al., 2009). Any decrease in
power consumption of physical servers in data centers will certainly have the largest
impact on the data center maintenance cost. Data centers usually house a large
number of servers connected by a high-speed network and provided with massive
storage units. The servers(physical hosts) used in DC are heterogeneous in type,
purchased from different vendors, and offer a distinct compute capability. These
heterogeneous physical servers often exhibit variability in their power consumption
and performance characteristics making some servers more power-efficient than others.
Many existing VM placement optimization techniques(Masdari et al., 2016) do
not consider the power efficiency of heterogeneous physical hosts and current pre-
vailing load conditions in data centers for VM provisioning and server consolidation
process. The power efficiency can be described by variability in power consumption
and throughput of two distinct machines at the same load levels. The data center will
experience changing load conditions through its 24x7 operation, and it is also vital to
37
optimize the task of VM placement to adapt to the current load conditions.
In this chapter, we propose a VM placement optimization technique for the
reduction in total power consumption of the data center by considering the power ef-
ficiency of heterogeneous physical servers and dynamically changing load conditions.
The rest of the chapter is organized as follows. Section 3.1 introduces the task of
VM placement optimization briefly, section 3.2 presents the objective of our pro-
posed technique. The system architecture of proposed VM placement optimization
is described in section 3.3. The mechanism to model the power efficiency of physical
machines is described in 3.4, and the technique for data center load condition based
adaptation is explained in 3.5. Section 3.6 describes proposed algorithms for the VM
placement optimization technique, and finally, the experimental setup, configurations,
and discussion on results obtained are presented in 3.7.
3.1 Background study
The VM placement optimization is a vital step in data center(DC) operations
to re-adjust the VM to PM mappings according to changing resource demands of
applications and the physical resource availability in data centers. VM placement
optimization is also helpful for server consolidation to save power during non-peak
situations in DC. The goal of VM placement optimization is to ensure that, resource
demands of user VMs are met with an optimal number of physical resources. The
task of dynamic VM placement optimization can be generally split into 4 sub-tasks,
1. Host overload detection: It is the process of detecting physical server overuse
where the performance of one or more VMs residing on it starts getting affected.
Hence it requires one or more VMs to be migrated out of it.
2. Host underload detection: It is the process of detecting physical server under-
utilization. Host under-utilization causes power wastage because of idle re-
sources in the system. The situation requires the consolidation of servers by mi-
grating VMs to other appropriate physical hosts(PMs), which enables switching
off of some of the servers to save power.
38
3. VM selection: VM selection is a process of selecting a VM to be migrated from
a set of VMs residing on an overloaded host for VM migration.
4. VM re-placement: VM re-placement is a process of searching a new suitable
host(PM) for migrating a VM from an overloaded host.
The task of VM placement optimization is invoked at fixed scheduled intervals in the
data center. The scheduling interval of 5 minutes is used in the distributed resource
scheduler (DRS) of VMware(Mosa and Paton, 2016).
3.2 Research objective
The proposed work in this chapter investigated vital contextual parameters that
can constitute the overall context of the data center. The following contextual pa-
rameters are considered.
1. Physical machine’s performance and power characteristics.
2. Prevailing load conditions in the data center.
With the help of these contextual parameters, we proposed an efficient VM to PM
load balancing technique to optimize the overall power consumption in the data center.
The objective of our proposed solution is shown as a block diagram in figure 3.1.
The proposed solution considers physical machines performance to power ratio,
which signifies the power efficiency of physical hosts and load conditions(peak and non-
peak) in data center for VM provisioning and server consolidation. We formulated
the power consumption optimization problem as follows.
∑N
Ptotal(t) = Po(t)(i)(l) (3.1)
i=0
Where,
Ptotal(t) denotes the total power consumption of cloud datacenter at time t.
N represents the number of physical machines at the data center.
Po(t)(i)(l) corresponds to power consumption of ith machine having CPU load of l%
at time t.
39
Figure 3.1: System Block Diagram For Proposed VM Placement Optimization
The objective of the proposed technique is to optimize the value of Ptotal without
affecting the response time of user applications and meet SLAs.
3.3 Proposed system architecture
The target environment of our proposed system in this chapter is a cloud IaaS
service model in a large scale data center with N heterogeneous machines. Each node
is composed of major system resources such as CPU, main memory, network, and
connected to network-attached storage(NAS) for storage. The proposed system has
no prior knowledge of user application workloads and VM placement details. The
geographically distributed users of such a cloud system can submit their VM place-
ment requests, which may comprise a dynamic mix of distinct application workloads.
These dynamic mixes of application workloads wrapped in VMs may be co-located
on a single physical server in the cloud data center. The software architecture for the
proposed solution consists of two distributed modules. These modules help to capture
context information of the data center at both the physical machine level(local) and
also at the data center level(global) for efficient VM provisioning and re-placements.
40
3.3.1 Local context manager
The local context manager(LCM) is designed to work at every physical host and
at the same layer of software where hypervisor(VMM) is placed. The block diagram
of the LCM is shown in figure 3.2. The LCM is responsible for the collection of
information related to physical hosts and all co-located VMs residing on it.
Figure 3.2: Local Context Manager Architecture
The following information is collected by the LCM at each physical host and is
regarded as the local context at each physical host in the data center.
1. Resource utilization details of all VMs( CPU, network, and memory).
2. Physical machine remaining resource capacity at run time.
3. Run time power consumption information.
Each host maintains the information about the performance and power characteris-
tics obtained by the SPEC power benchmark(SPEC, 2011). The resource utilization
details of VMs are used in determining the overall resource utilization statistics of the
physical host and in the determination of overload/underload conditions. The phys-
ical resource remaining capacity is needed to check the feasibility of the placement
of new VMs on the physical host. The run-time power consumption of the physical
host is needed to compute overall power consumption details of the data center at
any point in time. The local context manager shares details(local context) with the
global workload scheduler(GWS) for optimal VM provisioning decisions.
41
3.3.2 Global workload scheduler
The global workload scheduler(GWS) is designed to work at a central resource
management server or a central load balancing server in each data center. The GWS
module works in tandem with LCMs at each physical host to derive the current global
context and local context for dynamic VM load balancing in the data center. The
block diagram of the GWS module is shown in figure 3.3.
The following are the functions of the GWS module in the data center,
1. Detection of load conditions (peak or non-peak) in the data center(called load-
/global context)
2. Invoking VM placement optimization at regular intervals in the data centers to
re-adjust the VM-PM mappings to achieve power and performance efficiency.
Figure 3.3: Global Workload Scheduler Architecture
3.4 Power efficiency of physical machines
One of the objectives of our work reported in this thesis is to consider the power
efficiency of physical machines for VM provisioning and server consolidation decisions
42
of VM load balancer in data centers. The power consumption of a physical machine
in nothing but the collective sum of the power consumption of its sub-components
such as CPU, memory, disk, power supply unit, and cooling equipment. But some
past studies(Fan et al., 2007)(Kusic et al., 2008) have noted that there exists a linear
relationship between power consumption and CPU utilization. However, because of
evolving modern servers containing multi-core CPUs and support for virtualization,
servers are fitted with large RAMs. These large RAMs start to consume a signifi-
cant share of power in the total power consumption of physical servers. Also, the
difficulties in modeling power consumption of multi-core CPUs makes building an
accurate analytical model for power consumption analysis a complex research prob-
lem(Beloglazov and Buyya, 2012). So instead of relying on an analytical model for
power consumption, proposed work reported in this thesis uses real benchmark re-
sults for power consumption and performance metrics provided by the SPECpower
benchmark(SPEC, 2011).
The data centers consist of physical machines(servers) of varying configurations
and from different vendors. These physical machines will not exhibit homogeneity
in their power consumption and throughput profiles. We can measure the power
efficiency of a physical machine by taking a ratio of throughput(NumOps) to the
power consumed(Pc) at different defined load levels. An average of the values noted
at different load levels is considered as performance to power ratio of the physical
machine.
PerfToPowerR(Load%) = NumOps(Load%)/Pc(Load%) (3.2)
The ratio of performance to power consumption at different load levels of CPU uti-
lization for a given physical machine is represented by equation (3.2).
Where, Load % is calculated as a ratio of current CPU utilization of the physical ma-
chine to the total CPU capacity in MIPS and then multiplying the CPU utilization
fraction obtained by 100 as indicated in following equation (3.3),
Load% = (cpuUtilizationMIPS(PM)/TotalCpuMips(PM)) ∗ 100 (3.3)
43
Then using data of different load levels of PerfToPowerR ratio, an average PerfToPow-
erR can be calculated as in (3.4), where N indicates total number of distinct load levels
considered and PerfToPowerR(Li) specifies the performance to power ratio of physical
host at specific load level of CPU at instance i calculated from (3.2).
∑N
AverPerf2Pow = 1/N PerfToPowerR(Li) (3.4)
i=0
The AverPerf2Pow is considered as a metric for power efficiency of the correspond-
ing physical machine; a higher value of AverPerf2Pow indicates higher power effi-
ciency of the physical machine. The reported work in this thesis relies on the Spec
Figure 3.4: Proposed Host Selection Technique For VM Placement And Host
Shutdown.
Benchmark(SPEC, 2011) data published for several types of servers for calculation of
the power efficiency of physical machines. The SPEC power benchmark is the first
industry-standard benchmark that evaluates the power and performance character-
istics of the single server and multi-node servers. It can be used to compare power
and performance among different servers and serves as a toolset for bringing about
improvements in servers usage and efficiency.
Table 3.1 lists a set of server’s power and performance metrics reported in Spec
Power Benchmark(SPEC, 2011). These server configurations(types) are used for eval-
44
uation of our proposed work reported in this thesis.
The P and P2P in table 3.1 represents power consumption and performance to
power ratios at different load levels. The Avg P2P indicates average performance to
power ratio(AverPerf2Pow) of the corresponding physical machine(server) type. The
proposed technique prioritizes physical machines with higher AverPerf2Pow for phys-
ical host provisioning during VM allocation/re-allocation requests. During non-peak
hours, physical machines with lesser AverPerf2Pow are prioritized for power-off to
ensure power-efficient machines are used most to save power. Figure 3.4 describes
the prioritizing process of physical hosts based on their power efficiency for new VM
placement requests and also when host shutdown requests for power saving are pro-
cessed.
3.5 Load condition based adaptations
The VM placement optimization process has to check each physical machine(server)
for load conditions (overload and underload) at regular intervals in the data center.
It is done to re-map VMs to PMs as per prevailing load conditions to ensure perfor-
mance SLAs for user applications and also to save power. In the process of achieving
its goals, the VM placement optimization algorithm also consumes significant comput-
ing power and time of the CPU of the servers involved. It is essential to improve the
algorithm for VM placement optimization to consider overall load conditions(context)
of the data center to eliminate some of the VM placement optimization sub-tasks to
optimize power consumption in the data center. The host power off and power on
sequences will consume significant power and also CPU time. Also, VM migrations
arising out of host power off sequence will place demands for additional resources from
both source and destination physical machines.
The work reported in this thesis proposes modifications to the VM placement
optimization algorithm to skip acting on host underload conditions for PMs when the
data center is experiencing peak traffic(high load) situation. The proposed modifica-
tions avoid unnecessary host power-offs and VMmigrations during high load situations
to help the data center save significant amount of power and CPU time. Figure 3.5
illustrates the modifications proposed to the VM placement optimization technique in
45
46
Table 3.1: Power And Performance Metrics From SPECPower Benchmark
Target Load % 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Avg
PM Type P P2P P P2P P P2P P P2P P P2P P P2P P P2P P P2P P P2P P P2P P2P
HP ProLiant ML 110 G4 89.4 63.9 92.6 116 96 170 99.5 222 102 262 106 313 108 350 112 394 114 430 117 467 268
HP ProLiant ML 110 G5 97 102 101 195 105 282 110 354 116 426 121 494 125 554 129 618 133 679 135 731 431
HP ProLiant ML 110 G3 112 47.9 118 89.4 125 128 131 160 137 191 147 218 153 241 157 268 164 285 169 309 190
IBM Server x3250 46.7 665 52.3 1205 57.9 1621 65.4 1930 73 2143 80.7 2361 89.5 2466 99.6 2539 105 2685 113 2767 2098
IBM Server x3550 XeonX5675 98 917 109 1651 118 2274 128 2793 140 3201 153 3497 170 3680 189 3805 205 3929 222 4009 3093
IBM Server x3550 XeonX5670 107 861 120 1528 131 2094 143 2568 156 2933 173 3200 191 3363 211 3491 229 3603 247 3694 2843
the data center. The data center load context is calculated based on the average CPU
utilization of all active physical hosts. The load context parameter is configured to
take two states called peak and non-peak states based on a static threshold technique.
The powering off sequence and migrating VMs residing on the underloaded host are
skipped in the data center when load context is set to peak situation to save power.
Figure 3.5: Load Context Aware VM Placement Optimization Process.
3.6 Proposed context-aware VM placement optimiza-
tion
In this section, a context-aware VM placement optimization technique is de-
scribed. The objective of the proposed solution is to reduce the overall power con-
sumption of the data center without any performance penalty for user applications.
In our reported work, the global workload scheduler(GWS) is responsible for initi-
47
ating the VM placement optimization process. The proposed technique regularly
checks for host utilization(load) conditions by communicating with local context man-
agers(LCM) of each physical machine. The proposed work defines the VM optimiza-
tion scheduling interval as 5 minutes, which is a similar interval used in distributed
resource scheduler (DRS) of VMware(Mosa and Paton, 2016).
The proposed work considers the power efficiency of physical machines for VM
placement and server consolidation decisions. Also, the technique for detecting the
load context of the data center and based on the load context, an alternative method
to handle host underload conditions, is defined.
3.6.1 VM placement optimization process
The algorithm 3.1 presented in this section describes the steps of the proposed
context-aware VM placement optimization process. In our reported work, the VM
placement optimization algorithm is invoked by the global workload scheduler(GWS)
module at fixed regular intervals in the data center(invocation interval is set as 5
minutes in the proposed work).
The proposed algorithm for VM placement optimization at first handles the host
overload condition for all the active physical machines in the data center. The host
load detection process checks each physical host for overload condition and selects
one or more VMs from each of the overloaded hosts that need to be migrated out of
it to reduce its load. A new suitable destination physical host is searched for one or
more VMs that need to be migrated out of an overutilized host. The new (VM,PM)
pair for VM migration is added into the migrationList. The context-aware algorithm
queries the current load context in the data center. If the data center is experiencing
peak load situation, the steps to handle the host underload condition for each physical
host is skipped to avoid unnecessary power-offs and VM migrations as these physical
hosts may need to be powered on again to meet surging resource demands. If the
data center load condition is non-peak, the underutilized hosts are switched off after
migrating all VMs residing on it to save power. The time complexity of the algorithm
3.1 is O(n2).
48
Algorithm 3.1: Context-aware VM placement optimization
Input : pmList
Output: migrationList
1 vmsForMigration = 0;//initialize list of migrating VMs
2 foreach pm ∈ pmList do
/* Identify VMs to be migrated from Overloaded Hosts */
3 if isHostOverutilized(pm) then
4 Include VMs from overutilized host into the list of VMs considered for
migration
5 Find suitable destination host for VM migration
6 Add (VM, destination PM) pair into migrationList
7 end
/* Query current DC load context */
8 if isNonPeakSituationInDc() then
/* select VMs from underloaded Hosts for migration */
9 foreach pm ∈ pmList do
10 if isHostInUnderloadedCondition(pm) then
11 Include all VMs residing on underloaded host into list of VMs
considered for migration
12 Find suitable destination host for migrating all VMs
13 Add (VM, destination PM) pairs for all VMs into migrationList
14 end
15 end
16 end
17 end
18 return migrationList;
3.6.2 VM placement algorithm(PPABFD)
The algorithm for VM placement called power and performance-aware best fit
decreasing VM placement technique, a modified version of the power-aware best fit de-
creasing (PABFD) algorithm (Beloglazov and Buyya, 2012) is presented in algorithm
3.2.
The proposed algorithm first sorts the list of VMs for migration in descending
order of CPU utilization and also the list containing all physical hosts in the data
center are sorted in descending order of their average performance to power ratios.
This is done to consider the average performance to power ratio AverPerf2Pow of
physical hosts(PM) for prioritizing PMs for VM placement requests. For each VM
in the migration list, the physical hosts are checked for placement suitability and the
estimated power consumption after VM placement is calculated for all the suitable
physical hosts. The energy and performance efficient physical machine among all the
49
Algorithm 3.2: Power and performance aware BFD(PPABFD)
Input : pmList,vmList
Output: VMAllocationList
1 Sort vmList in the descending order of CPU utilization
2 Sort pmList in descending order of their average performance to power ratios
3 foreach vm ∈ vmList do
4 minPower = MAX_VALUE;
5 PMAssigned = NULL;
6 foreach pm ∈ pmList do
7 if isSuitablePM(pm,vm) then
8 power = estimatedPower(pm,vm);
9 if power < minPower then
10 PMAssigned = pm;
11 minPower = power;
12 end
13 end
14 end
15 if PMAssigned != NULL then
16 VMAllocationList.add(vm, PMAssigned);
17 end
18 end
19 return VMAllocationList;
suitable physical hosts is selected.
The algorithm ensures that the host machines(PM) with higher AverPerf2Pow
are prioritized for VM allocation to maximize utilization of the power-efficient physical
machines for power saving in DC. The algorithm PPABFD returns new VMs to PMs
allocations, which are efficient in terms of power and performance efficiency. The time
complexity of the algorithm 3.2 is O(n2).
3.6.3 Host underload condition
The host underload detection and switch off process is essential in data centers
to save power when the data center is not experiencing a heavy load. The host
underload detection process detects physical hosts with CPU utilization lesser than
a defined static threshold and selects underutilized physical hosts for power off. The
VMs residing on these selected underloaded hosts are migrated out before the host
power-off. However, the underloaded host selection technique should also consider the
power efficiency of the underutilized physical hosts for selecting a particular host for
power off.
50
When the data center is experiencing lesser workload requests, the proposed
VM placement optimization algorithm considers switching off the host machines with
lower power efficiency to maximize power saving benefit. The algorithm 3.3 presents
our proposed host underload detection and underutilized host selection technique for
power off. The proposed technique takes into account the performance to power
ratio AverPerf2Pow of the underutilized host machines for host selection to power
off. Algorithm 3.3 ensures that the physical host(PM) with CPU utilization lesser
than minUtilization and which is least power-efficient is switched off, thereby saving
power in a non-peak duration in the data center. The power-off of the physical host
is performed only after all the VMs residing on it are migrated out successfully. The
time complexity of the algorithm 3.3 is O(n).
Algorithm 3.3: Underloaded host detection algorithm
Input : pmList
Output: underUtilizedHost
/* Initialize to static threshold value for under utilization
check */
1 minP2PRatio = MAX_VAL;
2 minUtilization = LOWER_TRESHOLD;
/* Select power inefficient Host with lower than threshold CPU
utilization */
3 foreach pm ∈ pmList do
4 utilization = getCurrentUtilizationOfCpu(pm);
5 if (utilization > 0) && (utilization < minUtilization) then
6 power2PerfRatio = getPerf2PowerRatio(pm);
7 if power2PerfRatio < minP2PRatio then
8 underUtilizedHost = pm;
9 minP2PRatio = power2PerfRatio;
10 end
11 end
12 end
13 return underUtilizedHost;
14
3.6.4 Load context detection in datacenter
One of the objectives of the reported work in this thesis is to consider the global
data center load conditions for the VM placement optimization process. We present
a load context detection algorithm in algorithm 3.4 with a defined static threshold
51
utilization. The algorithm accesses the information stored by a local context man-
ager (LCM) at each physical host such as VMs running on hosts and their MIPS
utilization to arrive at the overall host CPU utilization. Once host utilization data
is summed up for all hosts in the data center, the proposed solution calculates the
average CPU utilization of data center servers. If the data center has an average
CPU utilization of over MAX_UTIL_THR_DC then the proposed algorithm desig-
nates the current load context as peak load duration. Otherwise, it is considered as
the normal/non-peak duration in the data center. The time complexity of the algo-
rithm 3.4 is O(n2). The total host utilization(TotalHostUtilization) is calculated by
summing up all VMs MIPS utilization stored at LCM at each physical host and To-
talHostUtilization for all hosts is used to calculate the average host utilization in the
data center (AverageHostUtilizationsInDc). The average host utilization in the data
center(AverageHostUtilizationsInDc) is compared against a defined threshold value of
CPU utilization to trigger the peak load condition(to set isPeakSituationFlag). The
algorithm 3.4 is invoked in algorithm 3.1 to get the current load context in DC.
Algorithm 3.4: DC load context detection algorithm
Input : pmList
Output: isPeakSituationFlag
1 TotalHostUtilizationsInDc= 0;
2 AverageHostUtilizationsInDc =0;
3 isPeakSituationFlag = FALSE;
/* Measure total DC MIPS utilization */
4 foreach pm ∈ pmList do
5 utilization = 0;
6 foreach vm ∈ VMListOf(pm) do
7 utilization = utilization + getMipsUtilization(vm);
8 end
9 TotalHostUtilization = TotalHostUtilization + utilization;
10 end
11 AverageHostUtilizationsInDc =
12 TotalHostUtilization / NumHostsInDc;
13 if AverageHostUtilizationsInDc > MAX_UTIL_THR_DC then
14 isPeakSituationFlag= TRUE;
15 end
16 return isPeakSituationFlag;
52
3.7 Experimental evaluation
The experimental evaluation of proposed context-aware VM placement optimiza-
tion technique for power saving has been carried out against another well known
adaptive heuristics-based technique for dynamic consolidation of VMs(Beloglazov and
Buyya, 2012).
3.7.1 Performance metrics
Various performance metrics used in the evaluation of the proposed solution are
described in this section.
1. Energy consumption
The metric denotes the total power consumption of all the physical hosts oper-
ating in the data center. Any decrease in the total power consumption in the
data center implies reduced power costs for the data center owners.
2. Overall SLA violations
SLA Violations occur because of the performance degradation caused by non-
optimal mappings of VMs-PMs. Performance degradation is due to the resource
shortages for co-located VMs often caused by server over-utilization and also
because of frequent migrations involving the same VM.
3. Total VM migrations
Total VM migrations denote the number of re-mappings of VMs to available
physical machines(PM) done during the given time. A very high number of VM
migration may mean performance degradation and wastage of network band-
width, computing resources on the source and destination nodes. A small num-
ber of VM migration may mean non-adaptability to dynamical situations in the
data center.
4. Total host(PM) shutdowns
The metric denotes the number of times the host machines(PM) are shut down in
a given duration. The physical machines(PM) are switched off for power saving
or any maintenance in data centers. Though PM shutdowns save power for
53
the data center, frequent shutdowns may mean additional power consumption
because of PM start-up or shutdown procedures and may also lead to hardware
component failures in the physical machines over time.
3.7.2 Experimental setup
The CloudSim(Calheiros et al., 2011) is used for the evaluation of the proposed
context-aware technique of VM placement optimization for power and cost-saving.
CloudSim is a popular toolkit for simulation and modeling of the cloud environment
and its applications among the research community. CloudSim provides both behav-
ioral and system modeling of cloud components. Simulation can help to evaluate
the performance of proposed architectures, algorithms, and applications prior to their
deployment in a highly dynamic, scalable and distributed environment like cloud.
CloudSim helps cloud developers to test the accuracy and performance of their
resource management and provisioning policies in a highly repeatable and controlled
environment without any cost burden. CloudSim also helps to overcome any bot-
tlenecks and any issues in runtime before deployment on the real cloud. CloudSim
provides essential classes for modeling of data centers, service brokers, computing
resources(CPU, RAM, network, etc), virtual machines, users, applications, and also
policies for management of various system-level components such as resource schedul-
ing and provisioning. Using the simulated cloud components, it is possible to evaluate
new techniques governing the use of cloud resources by utilizing existing or adding
new scheduling policies, load balancing algorithms, etc. It can also be used to testify
the competence of proposed techniques from various perspectives such as cost, power
consumption, and execution time. The layered architecture of CloudSim toolkit is
shown in figure 3.6.
For the evaluation of our proposed solution, the heterogeneous data center is
simulated using a composition of six different types of physical hosts(PM) with con-
figurations listed in table 3.2.
The experiments are conducted on an HP Probook computer with a compute
capability of Core i5 CPU and 8 GB RAM. The computer is driven by the Windows 7
operating system. The duration of the simulation is set to one day, which is a similar
54
55
Table 3.2: Physical Machine Configurations In Data Center
Serial RAM Network
No Machine Model Name MIPS
Number
of cores Size(in BW(in TypeMBs) GBs)
1 HP Proliant ML 110 G4 1860 2 4096 1 Small
2 HP Proliant ML 110 G5 2660 2 4096 1 Small
3 HP Proliant ML 110 G3 3000 2 4096 1 Medium
4 IBM server x3250 3067 4 8192 1 Medium
5 IBM serverx3550[Xeon-X5675] 3067 6 16384 1 Big
6 IBM serverx3550[Xeon-X5670] 2933 6 12288 1 Big
Figure 3.6: CloudSim Layered Architecture
duration used in the evaluation of another heuristic-based solution(Beloglazov and
Buyya, 2012). The proposed context-aware VM placement optimization technique
is invoked every 5 minutes once in the data center, which is a duration used in the
VMWare distributed resource scheduler(Mosa and Paton, 2016) called DRS to adjust
the VM-PM mappings.
The proposed solution is evaluated using two experiments carried out with two
different natures of workloads and multiple distinct resource configurations in the
data center. The objective of the first experiment is to testify the competence of
Table 3.3: Virtual Machines(VM) Configurations Used In DC
Serial VM Type [CPU_MIPS,num_cores, RAM_in_MBs,No VM_Size_in_GBs]
1 Type 1 [Extra Big] [2500,1,870,2.5]
2 Type 2 [Big] [2000,1,1740,2.5]
3 Type 3 [Small] [1000,1,1740,2.5]
4 Type 4 [Extra-Small] [500,1,613,2.5]
56
our proposed solution against synthetic workloads with a variable number of VMs to
simulate a lightly loaded scenario to heavily loaded scenarios in the data center. The
experimental configuration is chosen to ensure that our proposed solution is useful
in different load conditions. The aim of the second experiment is to appraise the
competence of our proposed context-aware solution against a real PlanetLab workload
traces(PlanetLab, 2011) containing CPU utilization data of 1033 VMs. The data
center configuration composing 400 PMs of six different host types is used.
3.7.3 Experiment 1: Synthetic workload with a variable num-
ber of VMs
The objective of the experiment is to testify the proposed VM placement op-
timization technique at different load conditions(lightly to heavily loaded) in a data
center. The lightly loaded and heavily loaded situations indicate the overall load
on the PMs considering available physical resource capacity of hosts and resource
demands of co-located VMs. Five different configurations are chosen for testifying
different load conditions.
• Configuration 1.1: 100 VMs to be allocated to 100 PMs
• Configuration 1.2: 200 VMs to be allocated to 100 PMs
• Configuration 1.3: 250 VMs to be allocated to 100 PMs
• Configuration 1.4: 300 VMs to be allocated to 100 PMs
• Configuration 1.5: 400 VMs to be allocated to 100 PMs
The simulation of the cloud data center is done composing of two physical ma-
chine types of HP ProLiant ML110 G4 and IBM server x3250 with configurations
shown in table 3.2 and all types of VM configurations shown in table 3.3 are used
to create VMs. Cloudlets are programmed to create utilization data every 5 minutes
based on the stochastic model(Calheiros et al., 2011). The energy consumption re-
sults of the proposed solution, along with a heuristic-based solution(Beloglazov and
Buyya, 2012) for five configurations of the synthetic workload are plotted in Figure
3.7.
57
Figure 3.7: Comparison Of Power Consumption Results For Synthetic Workload
Results suggest that the proposed solution saves approximately 8-10% energy
during lightly and heavily loaded cases and 2-6% during moderately loaded cases in
the data center. The power-saving achieved can be attributed to the power efficiency
aware VM placement and load context-based optimizations of VM placement. The
results of all of the performance metrics for the experiment 1 are tabulated in table 3.4.
The total VM migrations plotted in a graph in figure 3.8 for experiment 1(synthetic
workload) indicate an high increase with lightly loaded to heavily loaded scenarios
with heuristics based technique(Beloglazov and Buyya, 2012) due high number of
host shutdowns and non-optimal VM-PM mappings. However, with the proposed
solution, the number of VM migrations witness a small increase with lightly loaded to
heavily loaded scenarios. This indicates that the proposed context-aware technique is
able to generate better allocation strategy because of power efficient prioritization of
hosts for VM allocation and DC load aware server consolidation strategy.
Figure 3.9 and figure 3.10 indicate that the overall SLA violations and total
host shutdowns recorded for experiment 1 are much smaller in case of the proposed
solution when compared with the heuristic-based past work for all the configurations.
The proposed technique can avoid unnecessary host shutdowns and VM migrations
by considering load context in the data center.
58
59
Table 3.4: Evaluation Results For Performance Metrics For Synthetic Workload
SLA
Number
Energy perf Overall Number
Number of
consumption degradation SLA of host
of VMs VM
(in KWh) due to violation shutdowns
migrations
migration
Heuristics Proposed Heuristics Proposed Heuristics Proposed Heuristics Proposed Heuristics Proposed
based Solution based Solution based Solution based Solution based Solution
100 22.51 20.69 6102 412 0.15% 0.01% 0.04% 0.01% 1155 175
200 43.96 41.57 11156 921 0.13% 0.01% 1.07% 0.02% 1969 277
250 54.79 52.24 13001 866 0.12% 0.01% 0.83% 0.01% 2304 249
300 64.72 63.14 15107 1070 0.12% 0.01% 0.96% 0.02% 2607 289
400 89.55 82.50 22068 1100 0.14% 0.01% 1.84% 0.01% 3003 292
Figure 3.8: VM Migrations Results For Synthetic Workload
3.7.4 Experiment 2: Real-world workload with multiple PM
types
The objective of experiment 2 is to evaluate the competence of the proposed
VM placement optimization solution using a real-world workload in the data center.
Three configurations are chosen for testifying different levels of heterogeneity with
host machine types of varying power and performance characteristics.
• Configuration 2.1: DC with 2 host machine types
• Configuration 2.2: DC with 4 host machine types
• Configuration 2.3: DC with 6 host machine types
The cloud data center is simulated for experiment 2 using six types of physical
machines with configurations listed in table 3.2 consisting of 400 PMs. The exper-
iment utilizes a real-world workload consisting of resource utilization data of 1033
VMs(PlanetLab, 2011) captured in PlanetLab servers. The physical machine(PM)
types used in all the three configurations are tabulated in table 3.5, and their config-
urations can be found in table 3.2.
Figure 3.11 shows the power consumption details of the proposed solution and
adaptive heuristics-based technique(Beloglazov and Buyya, 2012). It can be noted
from the figure that our proposed solution saves approximately 1-3% power compared
to the heuristics-based technique.
60
Figure 3.9: Overall SLA Violations Results For Synthetic Workload
The results of all the performance metrics for real world workload(experiment
2) are tabulated in table 3.6.
The graphs plotted for total VM migrations in figure 3.12 indicates that the
VM migrations for the proposed context-aware technique are much smaller in number
compared to the heuristics-based method for all configurations. The overall SLA
violations and total host shutdowns recorded during experiment 2 are also much
smaller in the case of the proposed solution for experiment 2 as shown in figure
3.13 and figure 3.14. The proposed technique can avoid a higher number of VM
migrations by providing the near-optimal VM placement and by adopting a load
aware underutilized host management technique than the adaptive heuristics-based
technique proposed earlier(Beloglazov and Buyya, 2012).
Performance evaluation results suggest that the proposed context-aware VM
placement optimization solution performs better than the heuristics-based technique
for power consumption minimization and improves the efficiency of the operation by
reducing VM migrations, host shutdowns, and SLA violations in both the experiments
conducted. The proposed context-aware VM placement optimization technique can
reduce power consumption by 2-10% for synthetic workloads and 1-3% for real work-
load traces in the data centers.
The key differentiating factors between proposed context-aware solution and heuristics-
based technique are, using the performance and power characteristics of physical ma-
61
Figure 3.10: Number Of Host Shutdowns For Synthetic Workload
chines and detecting the global load context of the data center to improve the VM
placement optimization efficiency.
62
Table 3.5: Physical Machines Types Used In Experiment 2
Configuration name Host(PM) types
HP ProLiant ML110 G4
Configuration 2.1
IBM server x3250
HP ProLiant ML110 G4
IBM server x3250
Configuration 2.2
HP ProLiant ML110 G5
IBM server x3550 [XeonX5675]
HP ProLiant ML110 G4
IBM server x3250
HP ProLiant ML110 G5
Configuration 2.3
IBM server x3550 [XeonX5675]
HP ProLiant ML110 G3
IBM server x3550 [Xeon X5670]
Figure 3.11: Power Consumption Results For Real Workload
63
64
Table 3.6: Evaluation Results Of Performance Metrics For Real World Workload
SLA
Types Number
Energy perf Overall Number
of of
consumption degradation SLA of host
Physical VM
(in KWh) due to violation shutdowns
Servers migrations
migration
Heuristics Proposed Heuristics Proposed Heuristics Proposed Heuristics Proposed Heuristics Proposed
based Solution based Solution based Solution based Solution based Solution
2 40.95 40.41 16102 1875 0.07% 0.00% 0.08% 0.00% 2211 377
4 41.25 40.49 15855 1828 0.07% 0.00% 0.08% 0.00% 2228 382
6 40.97 39.73 15941 1716 0.07% 0.00% 0.08% 0.00% 2217 378
Figure 3.12: VM Migrations Results For Real Workload
Figure 3.13: Overall SLA Violations For Real Workload
65
Figure 3.14: Number Of Host Shutdowns Reported For Real Workload
66
3.8 Summary
The chapter first introduced the technique of VM placement optimization and
its sub-tasks. Then the chapter presented the research objective for the overall power
saving in the data center, described the physical machine characteristics and its ap-
plication to power saving, load condition detection and its consideration for under-
utilized host management. The proposed algorithms for VM placement optimiza-
tion, VM placement, underutilized host selection, and DC load context detection are
presented.Finally, the evaluation of the proposed technique with both synthetic and
real-world workloads is described. The results obtained for proposed technique sug-
gested that power saving of 2-10% for synthetic workloads and 1-3% with real-world
workloads is achieved.
In the next chapter, an electricity cost-aware request routing(load distribution)
algorithm for cloud service broker is presented for the power cost optimization in the
geographically distributed data centers scenario.
67
68
Chapter 4
Electricity cost-aware load balancing
in geo-distributed data centers
The adoption of cloud services by businesses across the globe is overgrowing,
and many new services and customers consuming these services are added at an ever-
increasing pace. Because of this growth, cloud providers like Amazon, Microsoft,
and Google have set up many geographically dispersed data centers, and they are
continuing to build more to support computing demands of their user bases. In the
arena of internet applications like those hosted on cloud, the speed and latency are
of utmost importance. The necessity creates a motivation for building geographically
distributed data centers around the world to reduce speed-of-light delays for user
applications hosted on cloud and accessed around the globe. But such a distributed set
up of data centers over various geo-locations create a new set of research problems and
opportunities. One such research problem addressed in this chapter is determining
how to distribute the user application traffic(load) across geographically dispersed
data centers to minimize the cost for data center providers.
The data centers need huge capital investments at the beginning of setting up IT
and non-IT infrastructure and later incur management costs for data center mainte-
nance and power(electricity) consumption to keep data center up for 24x7 operations.
It is noted that 15% of overall data center amortized costs(Greenberg et al., 2009)
corresponds to power/electricity cost.
Electricity is generated using various methods across the world, and its availabil-
69
ity and volume are not uniformly distributed. The cost of electricity at a geographical
location depends on various factors like availability of natural resources, technology
involved for generation, and cost of infrastructure needed for generation. Electricity
costs also found to vary based on the time of the day, total units consumed, etc based
on the domestic rules of each country.
It is essential to minimize data center management costs for the cloud providers
to help reduce the cost of ownership of a large scale computing facility like cloud data
centers. The distributed data centers provide an opportunity to utilize the electricity
price variability across the globe to optimize power costs. The rest of the contents
in this chapter is organized as follows, section 4.1 introduce the functions of the
cloud service broker briefly; section 4.2 presents the objective of our proposed work.
The power cost-aware technique to load balance user requests among geographically
dispersed data centers is described in 4.3, the experimental setup and configurations
are presented in section 4.4, and experimental results are discussed in section 4.5.
4.1 Background study
Cloud service broker is responsible for controlling traffic routing between users
and data centers in a geographically distributed data center set up. Cloud service
broker distributes the user requests for cloud applications across multiple available
DCs based on a load balancing algorithm/policy. The figure 4.1 shows the functions
of service broker module in a cloud computing environment.
The cloud service broker routing policies(Wickremasinghe et al., 2010) that are
commonly used are listed below.
• Proximity based routing - The closest data center in terms of transmission
delay is considered for routing.
• Performance optimized routing - The performance of all data centers is
monitored, and traffic is routed to the data center which is estimated to give
the best response time to the user.
• Dynamically re-configuring routing - It is very similar to proximity-based
routing, but it has an additional responsibility of scaling a load of a data cen-
70
Figure 4.1: Cloud Application Service Broker
ter by increasing or decreasing VM allocation based on current performance
comparing against best performance ever achieved with that data center.
The proposed work in this chapter describes a new task/request distribution algorithm
for the service broker to optimize the cost of power consumption for data center owners
without affecting the performance of user applications.
4.2 Research objective
The objective of the proposed work is to distribute more requests(load) on data
centers where electricity/power cost is cheaper at that point of time to optimize the
total power cost and also ensuring the response time is the same as or better than
the closely located data center. The proposed power cost optimization problem can
be mathematically presented as follows,
∑N
EC(N) = n(i)E(it)Pc (4.1)
i=0
In equation 4.1,
EC(N) denotes the total cost of electricity(power) for N data centers,
n(i) represents the number of user requests processed by i-th data center,
E(it) is the electricity cost at i-th data center location at time t,
Pc denotes electricity consumed by server per unit request which can be considered
71
as a constant value.
The objective of the proposed work is to minimize the value of EC(N).
4.3 Electricity cost-aware cloud service broker policy
The proposed technique aims to leverage the varying electricity price around the
world to optimize the power costs for data center owners. The proposed technique
for cloud service broker distributes user compute workload(requests) among available
data centers by incorporating electricity prices prevailing in the DC regions as a deci-
sion parameter. The electricity price is modelled as a two dimensional context variable
that varies with both place and time or amount of consumption. The 2-D table used
to represent electricity price is referred to as Electricity cost matrix(or EC matrix)
in this report and can be represented as shown in table 4.1. This EC matrix will
Table 4.1: Electricity Cost Matrix Representation
Geo Location 00:00-5:00 5:01-9:00 9:01-19:00 19:01-23:59
DC Location X x1 x2 x3 x4
DC Location Y y1 y2 y3 y4
DC Location Z z1 z2 z3 z4
... ... ... ... ...
DC Location N n1 n2 n3 n4
have one row for each of the data centers and each of the columns indicating another
parameter with which electricity cost varies for that geo-location, for example time of
the day as shown in table 4.1. The EC Matrix should be updated by administrator
based on domestic rules and made available to the cloud service broker at all the
time. The proposed cost-aware algorithm placed at the cloud service broker accesses
72
the following details about all the available geographically dispersed data centers.
1. The closest data center available to the request in terms of its transmission delay
and its estimated response time.
2. The updated EC Matrix containing the electricity price of all DC locations.
3. The estimate of response times for all data centers for the current request.
The criteria of the proposed cost-aware service broker algorithm for allocating requests
to a cheaper data center in terms of electricity prices are as follows.
1. The data center should have a response time lower than the closest data center.
2. The electricity cost of the selected data center should be lesser than other avail-
able data centers that satisfy the first criteria.
The algorithm for the cost-aware algorithm in the cloud service broker is presented
in algorithm 4.1.
The proposed algorithm 4.1 accesses the details of available geo-distributed data
centers(allDataCenters) and the updated EC matrix(allDccosts) during initialization
process. The algorithm accepts user base location of the incoming request as input
and finds the closest DC(closestDc) located to the corresponding user base location us-
ing the transmission delay matrix. The algorithm then calculates estimated response
times for all the available data centers(allDcEstTime) for the current request using the
network delay and last recorded response time(bestRecordedresponseTime) from the
corresponding DC. The estimated response time for closest DC(closestEstResponseTime)
for current request is also calculated. Once required parameters such as estimated
response time for all candidate DCs including closest DC(allDcEstTime), electricity
cost matrix(allDccosts) for processing current request are recorded, the cost-aware
algorithm finds the data center Id(dest) for which the estimated response time is
smaller than closest DC estimated response time(closestEstResponseTime) and elec-
tricity price(per unit price for power) is lesser than closer DC. The selected data center
ID(dest) for which there exists an estimated response time(leastEstRespTime) lower
than closest DC and having the electricity price advantage is returned for request
assignment otherwise closest data center Id(closestDc) is returned for the incoming
73
Algorithm 4.1: Electricity cost-aware request routing technique
Result: Finds cheaper DC with best response times
Input : src- Source location of request
Output: dest- Datacenter id for request routing
/* Initialization */
1 allDataCenters= getlAvailableDatacenterIds();
2 allDccosts =getECCostsforDCs(allDataCenters);
3 allDcEstTime = MAXTIME;
4 closestDc= findClosestDc();
/* Calculate estimated response times for all DCs */
5 foreach DataCenterId ∈ allDataCenters do
6 nwDelay = getNetworkDelay(src,DataCenterId);
7 bestRecordedresponseTime = getBestResponseTime(src,DataCenterId);
8 currEstResponseTime = nwDelay + bestRecordedresponseTime;
9 allDcEstTime[DataCenterId] =currEstResponseTime;
10 if DataCenterId == closestDc then
11 closestEstResponseTime = currEstResponseTime;
12 end
13 end
14 dest = closestDc;
15 leastEstRespTime = closestEstResponseTime;
/* Find fastest and cheapest DC */
16 foreach DataCenterId ∈ allDataCenters do
17 if EstResponseTime(DataCenterId) < leastEstRespTime then
18 if (getECCost(DataCenterId) < getECCost(dest)) then
19 dest = DataCenterId;
20 leastEstRespTime = EstResponseTime(DataCenterId);
21 end
22 end
23 end
24 return dest;
25
request. The time complexity of the algorithm 4.1 is O(n). The proposed cost-aware
technique ensures that response time for the request processing is optimal than closely
located DC with no degradation of service quality and also geo-distributed data center
owners will save power cost.
4.4 Experimental setup
The proposed technique is evaluated using the CloudAnalyst(Wickremasinghe
et al., 2010) toolkit widely used by researchers as a simulation tool for evaluating the
74
competence of the cloud computing resource management policies and applications.
The section provides a brief introduction of the CloudAnalyst tool and experimental
configuration parameter settings used for our evaluation.
4.4.1 CloudAnalyst
CloudAnalyst(Wickremasinghe et al., 2010) is a GUI based open source cloud
simulation tool to support simulation and visual modeling of large scale cloud ap-
plications. CloudAnalyst is built on top of CloudSim and provides many additional
extended features to describe application workloads, geographically dispersed data
centers, distributed user bases, and also supports configuring numbers and settings of
hardware/software resources in data centers. With CloudAnalyst, application devel-
opers and researchers can develop and evaluate resource provisioning, scheduling and
application deployment strategies for distributed data centers and users.
The block diagram of CloudAnalyst is shown in figure 4.2. It is built on the Cloudsim
framework and extends some of its classes to model complex internet and application
parameters. The CloudAnalyst provides a GUI layer to aid in conducting quick and
complex experiments with high degrees of flexibility and ease. Because CloudAnalyst
is an open-source simulator and built using a modular design, it is easy to extend the
tool to support a new feature or modify its behavior to support a new perspective
like cost of service.
Figure 4.2: Block Diagram Of CloudAnalyst
75
4.4.2 Experimental configurations
The section explains the experimental configurations used for the evaluation of
the proposed cost-aware technique for request routing in a geo-dispersed data center
setup.
A Electricity price for all DC locations
The price of electricity at various geo-regions where data centers are set up is
mentioned in table 4.2. The sample electricity cost/price values used in the exper-
iments are based on Wikipedia source(Wikipedia, 2017) available on the web. The
table 4.2 is referred to as EC Matrix. The EC Matrix considered for the evaluation
shows variability based on geo-locations but does not change with any other parameter
like time of day, units consumed, etc for the experiments.
Table 4.2: Electricity Cost Table
Data center Name (Location) Electricity Cost (in $/kWh)
DC1(USA) 0.17
DC2(Brazil) 0.25
DC3(UK) 0.21
DC4(China) 0.24
DC5(Africa) 0.13
DC6(Australia) 0.22
B Data centers and user bases
The evaluation of the proposed technique is done considering users of six differ-
ent geographical locations accessing cloud services from data centers located at six
geographic locations around the world. The data centers have configurations shown
in table 4.3 for the experiments.
Table 4.4 lists the user base configurations of six geographical regions used for
the experiments. The rest of the settings like hypervisor, OS, memory, hardware
76
Table 4.3: Data Center Configurations
DC Name Region Number of VMs Bandwidth(in mbps)
DC1 USA 500 1000
DC2 Brazil 500 1000
DC3 UK 500 1000
DC4 China 500 1000
DC5 Africa 500 1000
DC6 Australia 500 1000
configuration are considered uniform for all DCs. The experiment duration is set as
60 hours.
Table 4.4: User Base Configurations
DC Name Region Req/Hr Req Size Avg Peak Users Avg Non-Peak Users
UG1 USA 60 100 1000 100
UG2 Brazil 60 100 1000 100
UG3 UK 60 100 1000 100
UG4 China 60 100 1000 100
UG5 Africa 60 100 1000 100
UG6 Australia 60 100 1000 100
The transmission delay matrix is given in table 4.5 is used in the experiments to
search the closest located data center for any request received from user bases at the
cloud broker for DC assignment.
4.5 Experimental results and analysis
The experiments are conducted using multiple combinations of the user bases,
and data centers and results are presented in this section. The experiments are per-
formed for five different categories of user groups and the data centers. The format
used to represent each category is as follows.
77
Table 4.5: Transmission Delay Matrix Between Regions(in msec)
Regions USA Brazil UK China Africa Australia
USA 25 100 150 250 250 100
Brazil 100 25 250 500 350 200
UK 150 250 25 150 150 200
China 250 500 150 25 500 500
Africa 250 350 150 500 25 500
Australia 100 200 200 500 500 25
(Usergroups), (geo− distributedDatacenters)
For example,
(UG1), (DC3, DC4, DC5, DC6)
implies user group 1 located in the USA region can access the services from data
centers located in the United Kingdom(U.K), China, Africa and Australia for this
category of the experiment.
Table 4.6 tabulates the request assignments for the closest data center and cheap-
est data center corresponding to five categories of experiments. It can be observed
from table 4.6 that for experiments E4 and E5, the proposed technique is able to find
cheaper data center with estimated response time smaller than closest data center
and assign significant number of incoming requests.
Table 4.7 summarizes the total power costs for closer DC and cheaper DC as-
signments for proposed cost-aware technique and also power costs of closest DC
only(proximity based routing technique) assignments for five categories of experi-
ments. The power consumed per unit request is considered as a constant(Pc) of
0.1KWh for the experiments and power costs are calculated using EC Matrix per
geo-location as shown in equation 4.1. It can be noted from table 4.7 that the request
assignment to cheaper data centers in case of experiments E4 and E5 has reduced the
power costs by 15-23%.
78
79
Table 4.6: Proposed Service Broker Request Assignments
Exp Name Experimental Combina- Total requests Assignments to Assignments totion received Closest DC Cheapest DC
E1 (UG1),(DC3,DC4,DC5,DC6) 69425 69109 316
E2 (UG2), (DC3,DC4,DC5,DC6) 69425 65340 4085
E3 (UG1,UG2),(DC3,DC4,DC5,DC6) 139063 134820 4243
E4 (UG4),(DC2,DC5,DC6) 69365 35524 33901
E5 (UB3,UB4),(DC2,DC5,DC6) 139063 105040 34023
80
Table 4.7: Summary Of Power Costs For Proposed Technique
TotalCost
Experiment ClosestDC CheaperDC TotalCost Cost Sav-Cost Cost (Cost-aware) (ClosestDconly) ing
E1 $1520.39 $6.63 $1527.03 $1527.34 0.02%
E2 $1437.47 $85.68 $1523.16 $1527.34 0.27%
E3 $2966.03 $88.99 $3055.03 $3059.38 0.14%
E4 $888.09 $440.71 $1328.81 $1735.62 23.43%
E5 $1790.34 $442.29 $2232.64 $2640.91 15.45%
Figure 4.3 summarizes the percentage-wise assignment of proposed technique
for load(request) distribution to available data centers. The data center selection
Figure 4.3: Request Assignment Percentage
criterion of the proposed cost-aware algorithm is to find a cheaper data center with
better response time than the closely located data center. It can be observed from
the results that the E1 category has fewer assignments(0.4%) for cheaper data centers
because the data center DC6, which is located close(in terms of transmission delay)
to the user base of requests is also having estimated response time smaller than other
competing data centers for most of the requests. The E2 and E3 categories have 3-6%
request assignments for cheaper data centers whenever closely located data center
DC6 has higher estimated response time for request than cheaper data center DC3.
It can be noted from experiment categories E4 and E5 that cost-aware service broker
technique is able to allocate 24-49% of requests to cheaper data centers with significant
power cost-saving for data center owners.
Figure 4.4 presents the power costs for both proposed cost-aware technique and
closest DC only allocations to indicate the total power cost saving achieved. It can be
noted from experiment categories E4 and E5 that, the proposed cost-aware technique
for cloud service broker can save 15-23% of power cost for cloud data center owners.
It is evident from the evaluation results that the proposed cost-aware request
routing algorithm saves power cost of 15-23% for data center owners when there
81
Figure 4.4: Comparison Of Power Costs
exists an opportunity of routing the requests(processing load) to cheaper data centers
with no degradation in response times for user requests.
4.6 Summary
The chapter proposed an electricity cost-aware request routing technique to dis-
tribute tasks to data centers in a geographically distributed data center setup. The
chapter introduced the cloud service broker module and three of the well-known re-
quest routing techniques employed by the cloud service broker. Then, the chapter
presented the research objective addressed, described the proposed cost-aware re-
quest routing algorithm in detail. The experimental setup and a discussion on results
are presented to prove the effectiveness of the proposed solution for power cost saving.
The next chapter describes the equally spread current execution(ESCE) load bal-
ancing algorithm and then a problem observed with it during the peak load condition
is discussed. The chapter proposes a resolution to the problem and the experimental
evaluation of the proposed solution is presented at the end.
82
Chapter 5
Peak hour Performance Improvement
for ESCE Algorithm
In recent years, cloud computing has witnessed explosive growth because of the
advancement of networking technology and ease with which the cloud services(computing
hardware and software) can be rented and operated. Cloud computing has finally
made the idea of offering computing as a utility a reality, and since then cloud has
been embraced by millions of users across the world and also by giant IT compa-
nies like Amazon, HP, IBM, Microsoft, Apple, Google, Oracle, and others. The
scalability and efficiency features of the cloud can only be achieved by proper man-
agement(utilization) of cloud resources. The essential characteristic of the cloud is
the ability to manage and access the cloud resources in virtual form. The users
access the cloud resources by submitting their work requests to virtual entities of
computing called virtual machines(VMs) on rental basis. It is vital to balance the
work requests(load) from users across available virtual machines to achieve resource
efficiency through optimal utilization of underlying computing resources.
The load balancing in cloud data centers is done over both physical hosts or VMs.
In the case of VMs, the load balancing algorithm distributes the cloud users dynamic
workload equally among all the VMs. The performance of load balancing mechanism
is critical during peak hours in data centers to meet stringent performance SLAs
through optimal utilization of computing resources in the data centers. The over
and under allocation of load to even few VMs can cause performance degradation
83
and cause SLA violations. Our work reported in this chapter investigates the Equally
spread current execution load(ESCE) algorithm for a problem with managing uniform
resource utilization during high traffic situations and proposes a solution to address
the problem.
5.1 Background study
The section explains the user task scheduling model in cloud data centers and
briefly describes the ESCE algorithm for task load balancing.
5.1.1 Task scheduling in cloud data centers
The model used for task scheduling in the cloud data center is shown in figure
5.1. The cloud system contains N hosts and each running more than one VMs. The
load balancing is required in a system where there is a huge number of inputs tasks
submitted to cloud need to be assigned to a finite set of virtual machines. The VM
manager(Mishra et al., 2018) receives the input tasks submitted to cloud system from
the task queue. The VM manager has the information about the active VMs available
in the cloud data centers and available resources with different hosts. If available
resources are enough to complete the submitted tasks, the tasks are forwarded to
the task scheduler called task load balancer. If enough resources are not available to
process input tasks, new VMs are created in the data center to cater to additional
resource demands. The task scheduler acts as a load balancer to map each task to
the available VMs based on the resource requirements of each task and current load
on each active VM.
5.1.2 Equally spread current execution load algorithm(ESCE)
Equally spread current execution (ESCE) load algorithm(Mali and Vidya, 2013)
also known as active VM load balancer is a tasks-to-VM load balancer. The objective
of the ESCE algorithm is to equally spread the execution load on different VMs
in a data center to achieve uniform resource utilization. Active VM load balancer
maintains a VM table with VM id and the number of requests currently allocated to
84
Figure 5.1: Task Scheduling Model In Cloud
each VM id. If a task(request) is submitted to the data center task queue for execution,
the load balancer will search the VM table for least loaded VM(VM with least number
of request assignments). If more than one VM is found with equal number of task
assignments, first identified VM is selected and mapped for the task execution. The
load balancer updates the VM table by increasing the allocation count of identified
VM. When VM finishes the execution of allotted task, load balancer again update the
VM table by decreasing the allocation count for identified VM by one. The steps of
VM identification and VM allocation count update are done based on event triggers
by the data center controller.
Table 5.1: Example Of VM Allocation By ESCE Algorithm
Task ID VM ID 0 VM ID 1 VM ID 2
Init 0 0 0
0 1 0 0
1 1 1 0
2 1 1 1
3 2 1 1
The example of VM allocations by the ESCE algorithm is shown in table 5.1.
The table shows a scenario where tasks are allocated to 3 VMs in the data center using
the ESCE VM allocation algorithm. Initially, all VM Ids contain zero allocations as
indicated in the first row in table 5.1. It can be noted that the ESCE algorithm found
85
VM Id with the minimum allocation(Zero in this case) in case of task ids 0,1 and 2.
When task id 3 is requested for allocation, all VM IDs are having an equal number
of allocations. Hence in case of task id 3, the VM Id 0 is allocated to the task.
The problem in uniform utilization of VMs is observed with active VM(ESCE)
algorithm during the high traffic situations. The VM table update process in ESCE is
invoked by the data center controller(DCC) task allocation and de-allocation events.
When the data center controller requests ESCE algorithm for the least loaded VM
id for allocation, the VM id is found and returned to DCC. However, the VM table
update process is deferred until the VM id is mapped to the task by the data center
controller and notification for the allocation is sent to ESCE(active VM load balancer).
During the VM identification, VM allocation, and VM table allocation process, if any
new requests for VM identification for tasks are received by ESCE algorithm, the VM
table does not reflect the current state of the system and causes state inconsistency.
The VM table state inconsistency problem is frequently observed during peak hours
when huge number of tasks are submitted to the cloud system for processing. The
work discussed in this chapter offers a resolution to this problem.
5.2 Research objective
The objective of the proposed work in this chapter is to achieve uniform resource
utilization of all VMs at all times in the data center for the ESCE task load balancing
algorithm. The proposed work investigates the problem of over-allocation of VMs with
ESCE algorithm during high traffic situations and proposes a solution to overcome
the problem.
5.3 Proposed VM load balancer
The proposed VM load balancer is a modified version of the ESCE algorithm(Mali
and Vidya, 2013) to solve the problem of over-allocation of VMs in peak hour situa-
tions. The proposed algorithm uses an intermediate VM reservation table to record
identified VM recommendations to the data center controller until the identified VM
is allocated to the input task by data center controller and notification is received by
86
the proposed VM load balancer. The proposed load balancer takes into account both
VM table and reservation table entries for the identification of least loaded VM in the
set of available VMs in the data center. The proposed VM load balancer algorithm
is presented in algorithm 5.1.
Algorithm 5.1: Modified active VM(ESCE) algorithm
Result: Finds a least loaded VM for the given request
Input : VMList- List of active VMs
Output: VMid- Datacenter id for request routing
/* Initialization */
1 VMAllocTable= getVMAllocTable();
2 VMReserveTable =getVMReserveTable();
3 LeastLoadedVMid = INVALIDID;
4 minVMCount = MAXVALUE;
/* Find a VM with zero request allocations */
5 foreach VMid ∈ VMList do
6 if VMAllocTable[VMid] == 0 then
7 VMReserveTable[VMid] +=1;
8 return VMid;
9 end
10 end
/* Find a VM with least request allocations */
11 foreach VMid ∈ VMList do
12 currVMCount = VMAllocTable[VMid] + VMReserveTable[VMid];
13 if currVMCount < minVMCount then
14 LeastLoadedVMid = VMid;
15 minVMCount = currVMCount;
16 end
17 end
18 VMReserveTable[LeastLoadedVMid] +=1;
19 return LeastLoadedVMid;
20
The proposed load balancer returns the VM id for allocation to the data center
controller, and once task is allocated to suggested VM id, a notification is sent to the
proposed load balancer to increment the VM table entry meant for allocations and
decrement the count of reservation table of the allocated VM id as shown in the call
flow diagram in figure 5.2.
The proposed VM load balancer unlike the ESCE algorithm maintains an internal
reservation table(VMReserveTable) to maintain the information of VM reservations
suggested by the load balancer to the data center controller but not updated in allo-
87
cation table until the notification arrives of allocation. The proposed load balancer
takes into consideration both reservations table entry(VMReserveTable) and alloca-
tion statistics table entry(VMAllocTable) for particular VM id by the load balancer
for VM selection for the next request. The line numbers 2, 7 and 11-17 in algorithm
5.1 represent the modifications done to the ESCE algorithm by our proposed load
balancer. The modifications proposed to the ESCE algorithm avoids overloading of
VMs during peak hours and also help reduce response time for tasks waiting on an
overloaded VM. The time complexity of the algorithm 5.1 is O(n). The experimen-
tal results with the proposed load balancer indicating uniform resource utilization is
shown in table 5.3 and table 5.4 with description provided in next section.
Figure 5.2: Call Flow Diagram For Proposed Load Balancer
5.4 Experimental setup
The experimental evaluation of the proposed VM load balancing algorithm has
been carried out on a well-known simulator called CloudAnalyst(Wickremasinghe
et al., 2010). CloudAnalyst is a simulation tool based on cloudsim library, developed
using java and provides a GUI interface to configure various user and data center
88
parameters to perform the experimental work with ease. The experiments to evaluate
the competence of the proposed algorithm have been carried out using the configura-
tion of internet users at four different continents, i.e. four user bases along with peak
and non-peak users configurations used are given in the table 5.2. The requests(tasks)
of having unit length are considered for simplicity. Data center hosts homogeneous
physical machines having hardware resources with configurations of 100GB of storage,
4 GB of RAM with each physical machine(PM) equipped with 4 core CPU having
10K total MIPS(million instruction per second).
Table 5.2: User Bases: Regionwise Statistics Of Users
Peak
Region No Region Name Time Off-Peak
Users Users
0 North America 35000 3500
1 South America 25000 2500
2 Europe 15000 1500
3 Asia 5000 500
5.5 Experimental results and analysis
The experiments are conducted with two simple configurations of 5 VMs hosted
on two physical machines and 25 VMs hosted on 10 physical machines having con-
figurations described in the previous section. The results are analyzed with uniform
resource utilization criterion as the primary focus to check for non-uniform requests
assignment conditions with any of the virtual machines in the data centers. The re-
quest(task) allocations for each VM for both current active VM(ESCE) algorithm and
proposed VM Load balancer are tabulated in table 5.3 and plotted in the figure 5.3
for the case of the data center with the configuration of 5 VMs. It can be observed
from the experimental results that initial VM ids are allocated with higher number of
requests in case of ESCE algorithm because of the inconsistent VM allocation table
data during allocation request processing. However it can be noted that proposed
89
load balancer is able to allocate requests to the VMs uniformly.
Figure 5.3: Comparison Results For 5 VMs In DC
Table 5.3: Comparison Results For 5 VMs Case
VM Id Number of allo-
Number of al-
cations (ESCE) locations (Pro-posed LB)
0 39554 18502
1 19112 18507
2 14902 18503
3 10097 18504
4 8855 18504
Figure 5.4 plots the task assignments for 25 VMs for both the active VM(ESCE)
algorithm and the proposed VM Load balancer. The task assignment numbers for
ESCE and proposed VM load balancer are also tabulated in table 5.4. It can be
observed from the experimental results that, for 25 VM case also ESCE algorithm
does non-uniform allocation of requests to VMs and the proposed load balancer is
able to distribute requests uniformly over 25 VMs.
It can be noted from the results that the ESCE(active VM) algorithm allocates
tasks to VMs unevenly by over-allocating initial VM ids and under allocating remain-
ing VMs because algorithm refers to the inconsistent VM table during VM allocation
90
Figure 5.4: Comparison Results For 25 VMs In DC
for requests. The results also suggest that the proposed VM load balancer allocated
the requests(tasks) to VMs evenly by overcoming the problem of ESCE(active VM)
load balancer in all load conditions.
91
Table 5.4: Comparison Results For 25 VMs Case
Number of allo- Number of al-VM Id cations (ESCE) locations (Pro-posed LB)
0 42374 3680
1 17324 3705
2 10477 3703
3 6783 3704
4 4683 3705
5 3249 3703
6 2229 3705
7 1613 3702
8 1191 3702
9 835 3702
10 609 3699
11 424 3703
12 299 3703
13 195 3700
14 120 3703
15 59 3702
16 36 3702
17 17 3703
18 8 3703
19 3 3700
20 3 3703
21 5 3702
22 1 3702
23 1 3702
24 2 3702
92
5.6 Summary
The chapter introduced the task scheduling in the cloud system and explained
the ESCE load balancing algorithm and its problem with uniform VM task alloca-
tions during peak load situations. Then the chapter presented the research objective,
proposed load balancing algorithm to overcome the problem with ESCE. The experi-
mental set up used to evaluate the proposed load balancer for uniform VM allocations
is described. It can be noted from the experimental results that the proposed load
balancer is able to solve the VM over-allocation problem observed in ESCE load
balancer.
In the next chapter, the existing solutions available to access GPU computing in
the cloud are described. Then, the chapter discusses the existing research challenges
and opportunities with load balancing in GPU enabled cloud. The chapter also de-
scribes the current hurdles to efficient utilization of GPU under the virtualization
layer.
93
94
Chapter 6
Load balancing in GPU enabled
Cloud: Challenges and Opportunities
In recent years graphical processing units(GPUs) are gaining importance for
their massively parallel computing capability. The popularity is so much so that most
of the commercial computing platforms(devices) sold in the market have a variant
of GPU installed in it. Though GPUs were initially used only for graphics appli-
cations like gaming, display, etc., from past few years, GPUs are regarded as a high
throughput parallel computing platforms suitable for general purpose computing such
as high-performance computing(HPC), machine learning, medical imaging, inference
generation and to support computing for smart cities and infrastructure development.
The neural network (machine learning) extensions for deep learning and artificial in-
telligence built inside GPU hardware have only added to its growing popularity.
The ever increasing demand for GPUs has compelled cloud providers to en-
able GPU processing inside data centers to support hosting complex HPCs, real-time
applications, and virtual desktop applications for its users. Today, almost all main-
stream cloud providers like Amazon, Microsoft, Google have one, or many types of
GPU enabled instances to offer. The GPU enabled VM instances can accelerate user
applications significantly by offloading compute-intensive part of application logic
onto the block of parallel threads in GPU. However, conventional techniques of vir-
tualization do not hold good for GPUs because of the inherent differences in terms
of architectures, control software, and distributed program/memory models. These
95
differences make GPU provisioning in the virtualized environment and also GPU en-
abled VMs(gVMs) placement more complex and can cause inefficiency in resource
utilization.
The reported work in this chapter examines the current infrastructure(software
and hardware) available to support GPU inside cloud data centers and investigates ex-
isting challenges concerning efficient workload allocation(involving GPU computing)
and provisioning techniques of virtualized GPUs. The reported work also examines
the issues/challenges related to the effective utilization of GPU resources from appli-
cations when accessed from the virtualization layer.
6.1 Background study
6.1.1 GPUs and cloud datacenters
The typical block diagram of a GPU is shown in Figure 6.1(a). The GPUs con-
sist of several thousands of single instruction multiple data(SIMD) cores packaged
into streaming multiprocessors(SMs). SMs are responsible for executing GPU tasks.
Each GPU has its own local memory called GDDRAM (graphics double data rate)
and has two copy engines that can transfer data from GDDRAM to the main memory
of the server in both directions simultaneously. The Giga Thread Engine is responsi-
ble for scheduling GPU threads onto streaming multiprocessors(SMs) for execution.
The GPU tasks are usually submitted as a group of threads called blocks to GPU for
execution. GPUs are usually fitted inside Video cards, and each video card can host
multiple GPUs inside it. The GPUs are interconnected inside a video card using a
PCIe(peripheral component interconnect express) switch, and the video card is con-
nected to the host machine using PCIe connector. The PCIe connector is connected
to system bus using which data flow between GPU and Main memory is carried out,
as shown in figure 6.1(b).
The threads are distributed and scheduled inside SMs for execution. GPUs
employ cooperative multitasking based on a leftover policy(Siavashi and Momtazpour,
2018) for the scheduling of thread blocks onto SMs. The steps followed in GPU task
execution in an application is represented in figure 6.2.
96
Figure 6.1: Block Diagram Of Typical GPU And Video Card
Figure 6.2: Typical Flow Of GPU Task Execution In An Application
Initially, the data required for GPU tasks are transferred from main memory
(RAM) to GPU device memory (GDDRAM). Once the data transfer is complete, the
GPU tasks are bundled into blocks of threads and launched onto GPU for execution.
After the execution of all threads is completed, results are copied back to the main
memory from GPU memory(GDDRAM).
6.1.2 GPU virtualization in cloud
Virtualization is employed with GPU to share the same GPU device with mul-
tiple user applications residing inside separate VMs. Virtualization helps to use re-
sources efficiently by sharing unused computing power among different tasks.
97
Figure 6.3: System View Of GPU Enabled Virtualized Server And User VM With
GPU And CPU Tasks
The servers fitted with GPUs are offered by multiple vendors such as NVIDIA,
AMD, Intel, etc. Due to the unavailability of GPU virtualization support from the
vendor side in earlier days, user VMs were given direct pass through to GPU de-
vice(using vendor driver inside VM) for executing GPU tasks. However, in recent
years, vendors have begun to offer virtualization support to GPUs for use inside
cloud data centers such as NVIDIA Grid technology(NVidia, 2019). Figure 6.3(a)
shows a virtualized view of a GPU enabled server inside the data center, and figure
6.3(b) depicts a typical user VM containing GPU tasks. There are multiple virtual-
ization methods used for supporting GPU access inside cloud data centers with each
having its pros and cons. Following is a brief discussion of some of the prominent
GPU virtualization techniques.
i API remoting: The approach virtualizes GPU at API level where calls to GPU
are intercepted at the API level in the host and are forwarded to a remote machine
with GPU device for processing.
ii Full virtualization: The approach virtualizes GPUs at the device driver level
where a GPU driver is installed inside user VM to communicate with virtual GPU.
It will incur a penalty in performance. The hypervisor is responsible for scheduling
virtualized GPUs(vGPUs).
iii Paravirtualization: The approach is similar to full virtualization. However, the
guest OS driver is modified to avoid performance degradation to some extent.
98
The hypervisor is responsible for scheduling virtualized GPUs(vGPUs) in this
approach too.
iv Hardware-assisted virtualization: The approach is supported by special hard-
ware extensions provided by hardware vendors. These hardware extensions are
responsible for VM to GPU mappings and parallel (multiplexing) executions of
multiple VMs over GPU. The hypervisor may be involved to a minimal extent.
6.2 Research objective
The objective of the reported work in this chapter is to investigate current re-
search gaps in GPU resource management policies and also study the challenges re-
lated to programming for GPUs in virtualized environments.
6.3 GPU resource provisioning techniques in cloud
The GPU provisioning to vGPU enabled VMs in cloud data centers is done at
four different levels.
A Video card allocation
The video card allocation technique is responsible for allocating a GPU to a
vGPU in the physical server. GPUs are housed in video cards, and the selection
of video cards to be allocated for the given vGPU is taken care of by the video
card allocation technique. There are three video card allocation techniques that are
suggested(Siavashi and Momtazpour, 2018). The simple allocation policy follows first-
fit policy wherein the first found video card with GPU that satisfies vGPU resource
needs is selected. Breadth-first policy sorts available video cards in the ascending
order of their GPU loads and returns lightly loaded video card and depth-first policy
sorts the video card in descending order of GPU loads and returns the video card
which is just enough to satisfy the vGPU resource needs.
99
B GPU allocation
The GPU allocation technique is responsible for allocating a GPU in a video
card(with multi-GPUs) to a requested vGPU. Three allocation techniques, simple
first Fit, breadth-first, and depth-first, employed in video card allocation are also
used for GPU allocation.
C GPU enabled VM placement
VM placement policy for the VMs(with vGPU attached) is responsible for allo-
cating a physical server in a data center. There are two types of placement policies
proposed.
i First fit Policy: The technique used in VMware Horizon(VMware, 2019), where
all the hosts are iterated until VM in question is accepted by a physical host
considering VMs resource requirements.
ii First fit increasing: The technique(Siavashi and Momtazpour, 2018), first finds
the bottleneck resource between each host-VM pairs. Then sorting of all VMs in
ascending order is done based on their resource requirements and allocation to
physical hosts is done using the first-fit policy.
D GPU provisioning
The GPU provisioning technique is responsible for defining the sharing policy
of a physical GPU among multiple vGPUs. Some of the most commonly used GPU
provisioning schemes (Siavashi and Momtazpour, 2018)(Hong et al., 2017) are listed
below.
i Space shared: one vGPU occupies physical GPU till completion. The second
vGPU is allocated only once the first vGPU is completed.
ii Time shared: The vGPUs share a physical GPU until co-executing vGPU does
not exceed the total MIPS of a given GPU.
iii FCFS: First-come, first-serve policy allocates vGPUs in the order that they arrive.
100
iv Round-robin: Round-robin is similar to FCFS but assigns a fixed time slice to
vGPU. This policy is also called a fair share scheme.
v Priority-based: Priority-based provisioning assigns a priority to every vGPU,
and the provisioning logic executes vGPUs in the order of their priority.
vi Fair queuing: Fair queuing assigns a start tag to every vGPU and schedules
them for execution in increasing order of the start tags. The accumulated usage
time of a GPU is determined by the start tag value.
vii Credit-based: The algorithm periodically distributes credits to vGPUs, and each
vGPU consumes credits when it is executed on the CPU for exploiting the physical
GPU. The policy selects a vGPU with a positive credit value.
viii Affinity-based: The algorithm generates affinity scores for a vGPU to estimate
the performance impact when it is allocated on a specific resource.
ix SLA-based: SLA (Service Level Agreement) is an agreement between a cloud
service provider and a user about the quality of service(QoS) requirement and the
price to be charged. The objective of the SLA based policy is to meet the SLA
requirement while allocating GPU resources.
E Memory and PCIe bandwidth
The GPU device memory and PCIe bandwidth are two resources inside the
GPU device that needs to be shared by co-running vGPUs. Each GPU can transfer
data in two opposite directions simultaneously. PCIe bandwidth is provisioned on an
equal share basis to all co-executing vGPUs. Device memory is one of the essential
bottleneck resources inside GPU, which may have an impact on the performance of
co-executing vGPUs.
101
6.4 Current challenges with GPU computing in the
cloud
Virtualization is a crucial technology for the efficient utilization of server(PM)
resources in cloud data centers. The virtualization solutions for CPU, memory, and
network have attained sufficient maturity to be used in data centers for the benefit of
both cloud providers and users. However, the same conventional technologies in CPU
virtualization do not apply for the GPU virtualization because of inherent differences
in architecture, programming models, and vendor-specific device driver software.
The reported work in this chapter investigates various research challenges/issues
concerning efficient physical GPU utilization by current resource management and
provisioning techniques in the cloud environment and also examines several problems
with existing frameworks and technologies that limit user applications or VMs ability
to exploit the real power of GPUs wrapped under the virtualization layer.
6.4.1 Challenges with GPU resource management in cloud
The section discusses various system-level issues that prevent efficient resource
management of GPUs and GPU attached servers inside cloud data centers.
A GPU enabled VM migration
VM migration process is the re-placement of VM from the source physical host to
a destination physical host in the data centers. The VM migrations are usually done
for performance optimization, avoiding resource contentions, and to perform server
consolidation during non-peak load situations inside cloud data centers. Though there
are many proven algorithms(Choudhary et al., 2017) existing for the VM migration
process, the VM migration becomes complicated when GPU enabled VM is to be
migrated. The vGPU attached to the VM will have its process state and data inside
GPU memory. The VM migration process has to wait till application finishes its GPU
tasks or the ongoing GPU tasks need to be aborted in source machine and resumed in
the destination machine. The extra computation or the extra delay caused by vGPU
computation causes inefficiency in the VM migration process.
102
When GPUs in cloud servers are virtualized using hardware-assisted virtualiza-
tion technology like NVIDIA Grid(NVidia, 2019), such a virtualization technology
bypasses the core virtualization layer in the server for creating and managing virtual
images of GPU. When vGPU has to be live migrated from such hardware-assisted vir-
tualized GPU, retrieving GPU task states and restoring it on remote GPU is a complex
task. There is a need to establish novel mechanisms to live migrate hardware-assisted
virtualized GPU images from one GPU to another remote GPU efficiently.
B Power modeling of GPUs
Various vendors manufacture the GPUs, the components and architecture of
GPUs are inherently different from one another. Because of their hardware compo-
sition, the power consumption and performance characteristics will vary from one
another. Unlike CPUs, the power consumption and performance benchmarks(SPEC,
2011) for server scale GPUs are not available yet. The power-aware resource provi-
sioning policies for GPUs have to rely only on the mathematical model for power con-
sumption estimation. The mathematical equation(Siavashi and Momtazpour, 2018)
for power consumption analysis is given by equation 6.1.
P (f, U) = a3.f.U + a2.f + a1.U + a0 (6.1)
It is suggested that there is a linear correlation between power consumption and fre-
quency and utilization of SMs in GPU. In equation 6.1, the frequency f and utilization
U determine the power consumption approximation where a1,a2 and a3s are constants.
Because the power consumption approximation is based on a mathematical model, the
inherent physical composition of GPUs contributing to the power consumption factor
is not considered. The factor may impact the efficiency of the resource provisioning
algorithms.
C Power saving strategies involving GPUs
The power-saving in the data center is carried out by shutting down some of the
servers with lower CPU utilization during non-peak hours of data center operations.
However, when a physical server is equipped with one or more GPUs, the power
103
saving strategy that selects a physical host has to consider additional parameters
such as GPU utilization and state of GPU tasks to make host power-off decisions.
The performance and power characteristics of both CPU and GPU can be considered
for the selection of an underutilized host for power-off. The underutilized servers with
relatively less power efficient GPUs can be prioritized for power-off to maximize the
utilization of power-efficient GPUs.
D DC load aware GPU allocation policies
The current video card allocations, GPU allocation policies do not consider the
DC load conditions. The GPU allocation schemes have to adapt to changing load
conditions in data centers. The allocation techniques employed have to be aware
of data center load (peak or non-peak conditions) state to make optimal decisions
for allocations. During peak hours, the allocation policy can follow the breadth-
first scheme or first fit technique for GPU allocations. During non-peak hours, the
allocation technique can employ a depth-first strategy for the video card or GPU
allocations. Such adaptive schemes can make GPU allocation models more responsive
to the goal of power-saving during non-peak hours and also enables high performance
during peak hours in the data centers.
E GPU memory pollution with fair-share policy for GPU provisioning
The fair share policy for GPU provisioning allocates a time slice of physical
GPU to many vGPUs in round-robin fashion. GPU provisioning policies like fair
share will have to make many hosts to device and device to host data transactions. If
vGPU memory footprint(size) is not small, the vGPUs switching makes the process-
ing very inefficient because of multiple data transfers involving the main memory and
GPU device. Memory transfers are considered as major bottlenecks to achieve high
throughput, and GPU device memory size limits the level of multitasking on GPU if
data associated with GPU tasks is large. It is vital to consider data/memory trans-
actions between host and GPU for GPU provisioning decisions and also GDDRAM
memory size for scheduling GPU tasks on vGPUs.
104
6.4.2 Challenges with programming vGPUs
The section describes various user/programmer side issues that prevent exploit-
ing the true power of GPUs by accessing them from the top of the virtualization
layer.
A Target GPU generations
The GPUs available in the market possess different computing capabilities and
support a varying degree of features because of the generation and type. If applications
are ignorant of the target GPU generation, type or version, the program design may
not be able to truly exploit the power of physical GPU. For instance, the device
memory sizes, shared memory size, and the number of SIMD cores will vary between
generations of GPU, and such details will impact the design of data structures and
thread block sizes in algorithms. To overcome performance issues, the GPU allocation
and provisioning techniques can prioritize higher generations of GPUs over lower
generations for allocation decisions.
B Heterogeneity in GPUs and multiple frameworks
Multiple vendors manufacture data center-class GPUs in the market, and they
possess different hardware extensions to support various features such as deep learn-
ing, AI solutions. Such heterogeneity in GPUs makes VM re-placement a complicated
process. If VMs with GPU Tasks include demands for such additional features sup-
ported inside GPU, then such additional constraints need to be considered for VM
placement. The GPGPU programs use different frameworks (CUDA, OpenCL, Vul-
can, etc.) for accessing and computing on GPUs. Application using the CUDA
framework for their GPU Tasks inside VMs can only be allocated to NVIDIA GPUs
inside data centers. The resource management module for GPU provisioning needs to
consider such hardware and software related constraints for making VM placements.
C Security aspects
Some cross-platform GPU frameworks like OpenCL delay the GPU specific logic
(source code) compilations and GPU executable binary generation till run time if
105
target GPU device makes, version or generation is not known beforehand. Such VMs
with delayed compilation process should be placed in secured environments to avoid
cross VM attacks because application logic may be prone to leakage and may be used
with malicious intent. The GPU allocation and provisioning policy have to consider
such constraints for VM placements.
Some VM applications(Hong et al., 2017) may pose a denial of service attack for GPU
by submitting a massive number of GPU tasks to underlying GPU devices and deny
GPU resources for other co-allocated VMs. There is a need for a novel control mech-
anism in GPU virtualization layer to detect and control such VMs from overusing the
GPU device.
The current research challenges and opportunities with GPU computing in vir-
tualized environments discussed in this chapter can be addressed for designing an
efficient GPU resource management framework in cloud data centers to improve per-
formance and efficient GPU resource utilization.
6.5 Summary
The chapter describes the underlying architecture of GPU, current GPU virtu-
alization software, and hardware infrastructure available in cloud data centers and
then discusses various challenges investigated from GPU resource management and
programming virtual GPUs(vGPU) perspectives to motivate further research in the
load balancing techniques in GPU enabled cloud. Further research is needed to fo-
cus on solving some of the resource management issues discussed in this chapter to
improve GPU enabled VM placement, GPU resource provisioning, and power/cost
optimization algorithms.
In the next chapter, we summarize our research contributions and provide direc-
tions for future work.
106
Chapter 7
Conclusions and Future Work
To conclude this thesis, we first summarize the research contributions of the work
reported in this thesis. Although the techniques and concepts presented in this thesis
take a step forward in addressing some of these factors, several challenges remain to
be addressed to improve the existing resource management techniques used for the
cloud data centers in general. We list some of these extensions to our reported work
in this thesis and provide some directions for future work.
7.1 Summary of contributions
The techniques proposed in this thesis for leveraging the contextual parameters
to improve the load balancing decisions at multiple levels can be used as standalone
concepts. These techniques can be thought of as of-the-shelf entities for enhancing
already available and upcoming load balancing algorithms in the cloud. Following are
the brief descriptions of the contributions through this thesis,
• The very first problem addressed in this thesis is to consider physical ma-
chine performance to power characteristics(power efficiency) and data center
load characteristics for the VM placement optimization process. We have pre-
sented a framework for the collection and sharing of contextual information in
data centers. Further, algorithms are presented for load context detection, VM
placement, host consolidation and VM optimization tasks for power saving. It
can be noted from the experimental results that our proposed context-aware VM
107
placement optimization framework can save approximately 8-10% of power dur-
ing lightly and heavily loaded cases and 2-6% during moderately loaded cases
for synthetic workloads. With real-world workload traces, a power-saving of
1-3% is achieved by the proposed solution.
• The electricity prices vary with different geographical locations across the globe.
We have addressed the problem of cost-saving in geographically dispersed data
centers by considering electricity price and response time as parameters. We
have presented a novel algorithm in cloud broker for load balancing user traffic
among available geo-distributed data centers. The experimental results suggest
that our proposed technique is able to distribute the load to cheaper data centers
ranging from 3-6%(in case of experiment category E2 and E3) to about 50% (in
case of experiment category E4) when there exists a cheaper data center(with
lower electricity price) with relatively smaller estimated response times to closer
data center.
• The peak hour performance is critical for data centers to meet the high demands
of computing resources. The over-allocation and under-allocation of user tasks
to VMs can cause performance degradation for cloud applications. We have in-
vestigated an existing ESCE(active VM) load balancing algorithm for uniform
resource utilization and proposed a solution to solve the performance ineffi-
ciency in ESCE(active VM) load balancer during peak load situations. The
experimental results attested that proposed VM load balancer allocated the
tasks to available VMs evenly by overcoming the limitation of ESCE VM load
balancer.
• GPU computing in cloud data centers is gaining momentum swiftly because of
the massive parallel computing demands from applications like HPC, deep learn-
ing, and VDI applications. Though CPU virtualization techniques are matured
enough, GPU virtualization and resource management is still a budding area of
research. The VM placement techniques involving GPU allocation and provi-
sioning needs further consideration for additional parameters and constraints.
We have presented a summary of current GPU resource management techniques
108
available in cloud data centers. Further, we have presented the remaining chal-
lenges/issues concerning GPU resource management and programming virtual
GPUs(vGPU) to motivate further research in the related domain.
7.2 Directions for future work
Though the techniques and concepts presented in this thesis take a step forward
in addressing some of the relevant factors in the domain of resource management in
cloud computing, there are few extensions to our reported work possible to further
improve the load balancing process in the cloud environment. In this section, we
present some extensions and future directions as below.
• The co-located VMs on host machines can cause performance degradation due
to conflicting resource demands. This can have an impact on the overall power
consumption of the data center. There is a thorough investigation needed to
understand the impact of interference and affinity of co-located VMs on the host
resources from the perspective of power consumption. The affinity and interfer-
ence can be modeled as an additional contextual parameter for VM placement
decisions.
• The framework proposed for VM placement optimization has two modules,
GWS(Global workload scheduler) at a master node and LCM(Local context
manager) at each physical host to achieve context-aware VM placement op-
timization in the data centers. The proposed framework can be extended to
hierarchical GWS to support a multi-DC setup or to support logical scaling of
the resource management framework.
• VM placement optimization process can consider the user geo-locations in a
geographically dispersed multi data center scenarios for VM placement decision
to improve performance and overall power cost.
• The GPU enabled computing in the cloud is a relatively new area in cloud
computing. The resource management techniques are yet to attain maturity to
be used with GPUs wrapped in the virtualization layer efficiently. The research
challenges reported in our reported work can be considered for further work.
109
On a closing note, the domain of cloud computing had a remarkable journey so far.
The benefits of cloud computing have to lead more and more organizations to look
up to cloud computing as a solution for deploying their applications ranging from a
simple webserver to complex HPC applications. The contributions made in this thesis
extends the journey by enabling contextual parameters like power efficiency, varying
electricity price and load situations for load balancing at multiple levels in cloud
data centers. Nonetheless, there are many evolving challenges for cloud computing
researchers to address making the journey ahead one of discovery.
110
References
Abdelsamea, A., Hemayed, E., Eldeeb, H. and Elazhary, H. (2014). “Virtual Machine
Consolidation Challenges: A Review.” International Journal of Innovation and
Applied Studies, 8.
Ajit, M. and Vidya, G. (2013). “VM level load balancing in cloud environment.” 2013
Fourth International Conference on Computing, Communications and Networking
Technologies (ICCCNT). 1–5.
Armbrust, M., Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., Lee, G.,
Patterson, D., Rabkin, A. and Zaharia, M. (2009). “Above the Clouds: A Berkeley
View of Cloud Computing.” University of California at Berkeley UCB/EECS-2009-
28, February, 28.
Ashikur, R., Liu, X. and Kong, F. (2014). “A Survey on Geographic Load Balancing
Based Data Center Power Management in the Smart Grid Environment.” Commu-
nications Surveys Tutorials, IEEE, 16, 214–233.
Beloglazov and Buyya (2012). “Optimal Online Deterministic Algorithms and Adap-
tive Heuristics for Energy and Performance Efficient Dynamic Consolidation of Vir-
tual Machines in Cloud Data Centers.” Concurrency and Computation: Practice
and Experience, 24.
Berkeley (2016). “United States Data Center Energy Usage Report.”
https://datacenters.lbl.gov/resources/united-states-data-center-energy-usage (Ac-
cessed on Oct, 2019).
Calheiros, R., Ranjan, R., Beloglazov, A., De Rose, C. and Buyya, R. (2011).
“CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environ-
111
ments and Evaluation of Resource Provisioning Algorithms.” Software Practice and
Experience, 41, 23–50.
Chiang, Y.-J., Ouyang, Y.-C. and Hsu, C.-H. (2014). “An Efficient Green Control
Algorithm in Cloud Computing for Cost Optimization.” IEEE Transactions on
Cloud Computing, 3, 1–1.
Choudhary, A., Govil, M., Singh, G., Awasthi, L., Pilli, E. and Kapil, D. (2017).
“A critical survey of live virtual machine migration techniques.” Journal of Cloud
Computing, 6, 23.
Dashti, S. and Rahmani, A. (2015). “Dynamic VMs placement for energy efficiency
by PSO in cloud computing.” Journal of Experimental Theoretical Artificial Intel-
ligence, 1–16.
Dong, J.-k., Wang, H.-b., Li, Y.-y. and Cheng, S. (2014). “Virtual machine placement
optimizing to improve network performance in cloud data centers.” The Journal of
China Universities of Posts and Telecommunications, 21, 62–70.
Fan, Weber, W.-D. and Barroso, L. A. (2007). “Power Provisioning for a Warehouse-
sized Computer.” SIGARCH Comput. Archit. News, 35(2), 13–23.
Farooqui, N., Barik, R., Lewis, B., Shpeisman, T. and Schwan, K. (2016). “Affinity-
aware work-stealing for integrated CPU-GPU processors.” ACM SIGPLAN Notices,
51, 1–2.
Geeta and Singh, C. (2014). “Load Balancing in Distributed System Using FCFS
Algorithm with RBAC Concept and Priority Scheduling.” International Journal of
Recent Development in Engineering and Technology, 3(6), 33–39.
Goudarzi, H. and Pedram, M. (2013). “Geographical Load Balancing for Online Ser-
vice Applications in Distributed Datacenters.” 2013 IEEE Sixth International Con-
ference on Cloud Computing. 351–358.
Greenberg, A., R. Hamilton, J., Maltz, D. and Patel, P. (2009). “The Cost of a Cloud:
Research Problems in Data Center Networks.” Computer Communication Review,
39, 68–73.
112
Grosu, D. and Chronopoulos, A. T. (2004). “Algorithmic mechanism design for load
balancing in distributed systems.” IEEE Transactions on Systems, Man, and Cy-
bernetics, Part B (Cybernetics), 34(1), 77–84.
Gu, L., Zeng, D., Barnawi, A., Guo, S. and Stojmenovic, I. (2015). “Optimal Task
Placement with QoS Constraints in Geo-Distributed Data Centers Using DVFS.”
IEEE Transactions on Computers, 64, 2049–2059.
Guo, Y. and Fang, Y. (2013). “Electricity Cost Saving Strategy in Data Centers by
Using Energy Storage.” Parallel and Distributed Systems, IEEE Transactions on,
24, 1149–1160.
Gupta, V., Gavrilovska, A., Schwan, K., Kharche, H., Tolia, N., Talwar, V. and
Ranganathan, P. (2009). “GViM: GPU-accelerated virtual machines.” Proceedings
of the 3rd ACM Workshop on System-level Virtualization for High Performance
Computing, HPCVirt’09.
Gupta, V., Schwan, K., Tolia, N., Talwar, V. and Ranganathan, P. (2011). “Pegasus:
Coordinated Scheduling for Virtualized Accelerator-based Systems.” Proceedings of
the 2011 USENIX Conference on USENIX Annual Technical Conference. USENIX-
ATC’11, USENIX Association, Berkeley, CA, USA, 3–3.
Hamilton, J. (2019). “Overall Data Center Costs.”
https://perspectives.mvdirona.com/2010/09/overall-data-center-costs/ (Accessed
on Oct, 2019).
Hong, C.-H., Spence, I. and Nikolopoulos, D. (2017). “GPU Virtualization and
Scheduling Methods: A Comprehensive Survey.” ACM Computing Surveys, 50,
1–37.
Jain, M. (2019). “How Perficient are Cloud Deployment Models for N/W Storage
Needs?” https://www.konstantinfo.com/blog/cloud-deployment-model (Accessed
on Sep. 04, 2019).
Kanagavelu, R., Lee, B., Le, N., Mingjie, L. and Aung, K. (2014). “Virtual ma-
chine placement with two-path traffic routing for reduced congestion in data center
networks.” Computer Communications, 53, 1–12.
113
Kusic, Kephart, J., Hanson, J., Kandasamy, N. and Jiang, G. (2008). “Power and
Performance Management of Virtualized Computing Environments Via Lookahead
Control.” volume 12. 3–12.
Le, T. N., Liang, J., Liu, Z., Sitaraman, R. K., Nair, J. and Choi, B. J. (2017).
“Optimal Energy Procurement for Geo-distributed Data Centers in Multi-timescale
Electricity Markets.” SIGMETRICS Perform. Eval. Rev., 45(2), 58–63.
Li, W., Tordsson, J. and Elmroth, E. (2011). “Virtual machine placement for pre-
dictable and time-constrained peak loads.” Lecture Notes in Computer Science,
7150, 120–134.
Li, X., Qian, Z., Lu, S. and Wu, J. (2013). “Energy efficient virtual machine place-
ment algorithm with balanced and improved resource utilization in a data center.”
Mathematical and Computer Modelling, 58, 1222–1235.
Liu, C., Shen, C., Li, S. and Wang, S. (2014). “A new evolutionary multi-objective
algorithm to virtual machine placement in virtualized data center.” 2014 IEEE 5th
International Conference on Software Engineering and Service Science. 272–275.
Mali, A. and Vidya, G. (2013). “VM level load balancing in cloud environment.”
2013 4th International Conference on Computing, Communications and Networking
Technologies, ICCCNT 2013. 1–5.
Masdari, M., Nabavi, S. and Ahmadi, V. (2016). “An Overview of Virtual Machine
Placement Schemes In Cloud Computing.” Journal of Network and Computer Ap-
plications, 66, 106–127.
Mell, P. and Grance, T. (2011). “The NIST definition of cloud computing.” Commu-
nications of the ACM, 53.
Menychtas, K., Shen, K. and Scott, M. L. (2014). “Disengaged Scheduling for Fair,
Protected Access to Fast Computational Accelerators.” SIGPLAN Not., 49(4),
301–316.
Mishra, R. K., Kumar, S. and Sreenu Naik, B. (2014). “Priority based Round-Robin
114
service broker algorithm for Cloud-Analyst.” 2014 IEEE International Advance
Computing Conference (IACC). 878–881.
Mishra, S., Sahoo, B. and Parida, P. (2018). “Load Balancing in Cloud Computing:
A big Picture.” Journal of King Saud University.
Moreno, I., Yang, R., Xu, J. and Wo, T. (2013). “Improved energy-efficiency in cloud
datacenters with interference-aware virtual machine placement.” 1–8.
Mosa, A. and Paton, N. (2016). “Optimizing virtual machine placement for energy
and SLA in clouds using utility functions.” Journal of Cloud Computing, 5.
Nadeem, S. and Mohammed, F. (2015). “Static Load Balancing Algorithms In Cloud
Computing: Challenges and Solutions.” International Journal of Scientific and
Technology Research, 4, 353–355.
Nadjaran Toosi, A., Qu, C., de Assuno, M. D. and Buyya, R. (2017). “Renewable-
aware Geographical Load Balancing of Web Applications for Sustainable Data Cen-
ters.” Journal of Network and Computer Applications, 83(C), 155–168.
NVidia (2019). “NVIDIA GRID Technology.” https://www.nvidia.com/en-us/data-
center/virtual-gpu-technology/(Accessed on Oct, 2019).
PlanetLab (2011). “PlanetLab Workload Traces.”
https://github.com/beloglazov/planetlab-workload-traces (Accessed on Sep,
2019).
Sayeedkhan, P. N., Nanded, V., S, M. S. B. and Nanded, V. (2014). “Virtual Machine
Placement Based on Disk I/O Load in Cloud.” International Journal of Computer
Science and Information Technologies, 5.
Sengupta, D., Belapure, R. and Schwan, K. (2013). “Multi-tenancy on GPGPU-
based Servers.” Proceedings of the 7th International Workshop on Virtualization
Technologies in Distributed Computing. VTDC ’13, ACM, 3–10.
Siavashi, A. and Momtazpour, M. (2018). “GPUCloudSim: an extension of CloudSim
for modeling and simulation of GPUs in cloud data centers.” The Journal of Su-
percomputing, 2535–2561.
115
S.Jyothsna (2016). “Distributed Load Balancing in Cloud using Honey Bee optimiza-
tion.” International Journal of Emerging Trends Technology in Computer Science,
5(6), 102–106.
SPEC (2011). “The SPECpower benchmark.” http://www.spec.org/power_ssj2008/
(Accessed on Sep, 2019).
Sudevalayam, S. and Kulkarni, P. (2011). “Affinity-Aware Modeling of CPU Usage for
Provisioning Virtualized Applications.” Proceedings - 2011 IEEE 4th International
Conference on Cloud Computing, CLOUD 2011. 139–146.
Suzuki, Y., Kato, S., Yamada, H. and Kono, K. (2014). “GPUvm: Why Not Virtual-
izing GPUs at the Hypervisor?” Proceedings of the 2014 USENIX Conference on
USENIX Annual Technical Conference. USENIX ATC’14, USENIX Association,
Berkeley, CA, USA, 109–120.
Toosi, A., Vanmechelen, K., Ramamohanarao, K. and Buyya, R. (2014). “Revenue
Maximization with Optimal Capacity Control in Infrastructure as a Service Cloud
Markets.” IEEE Transactions on Cloud Computing, 3, 1–1.
Toosi, A. N. and Buyya, R. (2015). “A Fuzzy Logic-Based Controller for Cost and En-
ergy Efficient Load Balancing in Geo-distributed Data Centers.” 2015 IEEE/ACM
8th International Conference on Utility and Cloud Computing (UCC). 186–194.
Tripathi, R., Vignesh, S., Tamarapalli, V., Chronopoulos, A. and Siar, H. (2017).
“Non-cooperative power and latency aware load balancing in distributed data cen-
ters.” Journal of Parallel and Distributed Computing, 107.
VMware (2019). “Horizon—virtual desktop infrastructure.”
https://www.vmware.com/products/horizon.html (Accessed on Oct, 2019).
Vmware (2019). “Virtualization.” https://www.vmware.com/in/solutions/virtualization.html
(Accessed on Sep, 2019).
Wickremasinghe, B., Calheiros, R. and Buyya, R. (2010). “CloudAnalyst: A
CloudSim-Based Visual Modeller for Analysing Cloud Computing Environments
and Applications.” 446–452.
116
Wikipedia (2017). “Electricity Price at Geographical locations.”
https://en.wikipedia.org/wiki/Electricity_pricing/ (Accessed on Jan, 2017).
Xiao, Z., Jiang, J., Zhu, Y., Ming, Z., Zhong, S. and Cai, S. (2014). “A Solution of
Dynamic VMs Placement Problem for Energy Consumption Optimization Based
on Evolutionary Game Theory.” Journal of Systems and Software, 101.
Zhang, C., Yao, J., Qi, Z., Yu, M. and Guan, H. (2014). “vGASA: Adaptive Scheduling
Algorithm of Virtualized GPU Resource in Cloud Gaming.” IEEE Transactions on
Parallel and Distributed Systems, 25(11), 3036–3045.
Zhao, D.-M., Zhou, J. and Li, K. (2019). “An energy-aware algorithm for virtual
machine placement in cloud computing.” IEEE Access, 07, 55659–55668.
117
List of publications
Journal publications
1. Ashwin Kumar Kulkarni and Annappa B. (2019).“Context aware VM placement
optimization technique for heterogeneous IaaS cloud”, IEEE Access, Volume 7,
Issue 1, pp 89702-89713. (DOI:10.1109/ACCESS.2019.2926291)
2. Ashwin Kumar Kulkarni and Annappa B. “GPU computing in cloud:Resource
management and programming perspectives in virtualized environments”. (Un-
der Review in Journal of King Saud University - Computer and Information
Sciences, Elsevier).
Conference publications
1. Ashwin Kumar Kulkarni and Annappa B. (2017). “Cost aware service broker
algorithm for load balancing geo-distrubuted data centers in cloud”. In IEEE
International Conference on Signal Processing, Informatics, Communication and
Energy Systems, Kollam, India, pp 1-5. (DOI:10.1109/SPICES.2017.8091337).
2. Ashwin Kumar Kulkarni and Annappa B. (2015). “Load balancing strategy for
optimal peak hour performance in cloud datacenters”. In IEEE International
Conference on Signal Processing, Informatics, Communication and Energy Sys-
tems, Calicut, India. pp 1-5. (DOI:10.1109/SPICES.2015.7091496)
118
Brief Bio-Data
Mr.Ashwin Kumar
Research Scholar
Department of Computer Science and Engineering
National Institute of Technology Karnataka, Surathkal
P.O. Srinivasnagar
Mangalore - 575025
Phone: +91 9980156977
Email: ashwin.sony@gmail.com
Permanent Address
Ashwin Kumar
S/o Dr. D.V. Kulkarni
Noorandeshwar Colony
Moratagi - 586123
Sindagi (Tq.), Vijayapura (Dist.)
Karnataka, INDIA
Qualification
M. Tech. in Computer Science and Engineering, Visvesvaraya Technological Univer-
sity, Belgaum, Karnataka, 2011.
B. E. Computer Science and Engineering, Visvesvaraya Technological University, Bel-
gaum, Karnataka, 2004.
119