Dynamic Placement of Virtual Machines for

January 21, 2018 | Author: Anonymous | Category: Math, Statistics And Probability, Statistics
Share Embed Donate


Short Description

Download Dynamic Placement of Virtual Machines for...

Description

Dynamic Placement of Virtual Machines for Managing SLA Violations NORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATY SOME SLIDE CONTENT ADAPTED FROM ALEXANDER NUS PRESENTED BY JON LOGAN

Motivation



Virtual machines are becoming more and more popular throughout our datacenters



Servers use electricity 

Electricity can be expensive!



How do we minimize the number of utilized machines, while meeting our SLA obligations?



Usage patterns of machines are NOT static, and generally change dynamically

Goals



Maximize utilization of active machines



Minimize Service Level Agreement (SLA) violations



Minimize number of active machines 



Power off unused machines to conserve cost (electricity)

Essentially, minimize cost while meeting SLA guarantees

Static Allocation



All machines are taken offline, and historical usage is used to determine ideal placement



Happens very infrequently (~weeks or months)



Must interrupt service to relocate



Utilization is not consistent in many cases! Demand may vary significantly within the period between allocations

Dynamic Allocation



VMs are seamlessly migrated between machines based on predicted demand



Is done rather frequently (~minutes, hours)



Live migration 



Minimal (~ms) service disruptions during migration

Allows for allocations to more closely follow demand

Live Migration



Moves a VM image between machines without service interruption



The paper cites a ~45 second transition time



VM must be serialized and transferred over the network



Artificially limits our reallocation period 

Can’t reallocate faster than we can migrate!

Service Level Agreement





Essentially is a contract between the provider and the customer that states that resources R will be available X% of the time 

Violations cost money!



X is usually high (ex. 95%)

VMs do not necessarily use this entire resource allocation at all times, but it must be available should they choose to use it 

Ex. VM may be doing batch processing, and only do substantial work between 12:00AM and 1:00AM

Static vs Dynamic Usages



Workloads are not static!



Try to predict the usage of the VM in a time T



Reallocate machines to be able to meet that predicted usage



Need to be within a certain percentile to meet SLA requirements



Capacity savings is simply 



Static Allocation - (Predicted Usage + Error Factor)

Repeat this process every time T

What Workloads Are Best For Dynamic Allocation? 

Not all Workloads are created equal 

Some tend to be better than others



Constant workloads = bad!



A workload is an ideal candidate for dynamic allocation if





It has strong variability AND



It has strong autocorrelation combined with periodic behavior

Essentially, you need to have a decent degree of variability, and be able to reasonably predict its usage

Workload 3a



Strongly variable – good



Autocorrelation ~0.8 – good



Weak periodic behavior – bad



Verdict – Good 

Large variability offers significant potential for optimization



Strong autocorrelation makes it possible to obtain a low-error predication

Workload 3b



Weakly variable - bad



Decaying autocorrelation - bad



Weak periodic behavior – bad



Verdict – Bad 

Low variability makes potential gain low



Weak autocorrelation and no periodic component make it difficult to predict demand

Workload 3c



Strongly variable – good



Strong Autocorrelation– good



Strong periodic behavior – good



Verdict – Very Good 

An ideal case for dynamic allocation

Potential Gain

Demand forecast algorithm



Determine the periods in demand using ‘common sense’ aided by periodogram (e.g.time-of-day,day of week,…)



Decompose the process into deterministic periodic and residual components Di + ri



Estimate the deterministic part using averaging of multiple smoothed historical periods



Fit Auto Regressive Moving Average (ARMA) model to the residual process



Use the combined components for demand prediction

Ui = Di + ri

Management Algorithm



Goal is to minimize time averaged number of active servers without violating the SLA agreement



Machines that are not utilized to handle VMs are powered off or put in a low power state 

Will be reactivated if/when required (minimally, the next period)



The time to power on & migrate must be less than the period T



Responsible for actual migrations of machines



Placing of VMs is essentially a version of the bin packing problem 

NP hard!



We use an approximation, using first-fit

Management Algorithm



Measure – Measure usage



Forecast – Predict usage for the next window



Remap – Relocate machines if necessary



Preform this (MFR) at regular intervals



Designed to try to predict the “best we can do”

Management Algorithm Overview

Key Terms



N – virtual machines



M – physical machines



Cm – Maximum capacity of physical machine



fni, k – forcast value for resource demand of VM n at interval i+k



R – migration interval



Cp(u, o2) – (1-p)-percentile of Gaussian distribution with mean u and variance o2

Management Algorithm

Management Algorithm (2)

Management Algorithm (3)

Management Algorithm (4)

Simulations 

Simulated using traces gathered from hundreds of production servers using various applications



Traces contain CPU, memory, storage, and network 

We are only focusing on CPU usage



Samples were collected every 15 minutes



The simulated study 

Verifies that the MFR meets SLA targets



Quantifies the reduction of SLA violations



Quantifies the number of saved machines



Explores the relationship between the remapping interval and the gain from dynamic management



Performs measurements to determine properties of a practical infrastructure with respect to migration of VMs

Overflows vs Number of PMs

Number of Machines vs Overflow Desired

Significantly reduces number of machines active

Performance degrades as the migration interval increases Essentially, the prediction is the max usage predicted within the range

Limitations 

The paper only looks at one resource utilization 

In this case, CPU utilization



In the real world, you have numerous resources to handle allocations for 



Assumes bandwidth between machines is free & unrestricted 



Memory, CPU, IO, Network, etc.

Relocating some VMs in some cases may not be worth the cost of relocating the image

Their study size is small 

Only 6 physical machines



What if different VMs have different SLA requirements?



What if your PMs had differing hardware?

Conclusion



Based on the simulated data, it significantly reduces cost to execute virtual machines



Relies on an ideal case of VMs 

Predictable and volatile usage



Algorithm could be optimized to reduce the number of VM relocations, or to more optimally schedule



Simulation is too small 

The paper claims a 44% average savings in the number of active PMs

View more...

Comments

Copyright � 2017 NANOPDF Inc.
SUPPORT NANOPDF