Slides

January 5, 2018 | Author: Anonymous | Category: Business, Economics
Share Embed Donate


Short Description

Download Slides...

Description

MANAGING DISTRIBUTED UPS ENERGY FOR EFFECTIVE POWER CAPPING IN DATA CENTERS

Vasileios Kontorinis, L.Zhang, B.Aksanli, J.Sampson, H.Homayoun, E. Pettis*, D. Tullsen, T. Rosing

*Google

ISCA 2012

UCSD

Datacenter market is growing 2



World is becoming more IT dependent. 

Internet users increased from 16% to 30% of world population in 5 years [Internet World Stats]



Smart phones are projected to jump from 500M in 2011 to 2B in 2015 [Inter.Telecom.Union]



Internet heavily depends on Datacenters 

Data center power will double in 5 years



Expected worldwide Datacenter Investment in 2012: 35B$ (equivalent to GDP of Lithuania) [DataCenterDynamics]

Important to build cost-effective Datacenters

Power Oversubscription - Opportunity 3

Datacenter

More servers

Server Cost

Total Cost of Ownership / Server

No Oversubscription

One time capital expenses

Servers

Supporting equipment

Recurring Costs

With Oversubscription

Same infrastructure Power Oversubscription More Cost-effective Data centers

Power Oversubscription – Opportunity 4

 

[Barroso et al. + APC TCO calc] Assumptions:   

 



Server cost: 1500$ 28000 servers (10MW) Energy: 4.7c/KWh Power: 12$/kW Amort. Time DC: 10y, servers: 4y Distributed LA-based UPS

Available at:

http://cseweb.ucsd.edu/~tullsen/DCmodeling.html

Utility Peak 5.5%

Facility Space 4.5%

Utility Energy 11.7%

Power Infrastructure 7.9%

Cooling Infrastructure 3.3%

PUE overhead 2.6% Server Opex 2.0%

Rest 11.9%

DC opex 9.9%

Server Depreciation 40.6% UPS LA 0.2%

Power Oversubscription using Stored Energy 5

Power Profile Pulse Model Shaping

Diurnal Power Profile

Power

Power

Peak Power

M

Tu

W

Time

  



Peak Power Pulse Peak Power Reduction Low Power Pulse



UPS stored Energy

+ _

Su

Time

Leverage diurnal patterns of web services Discharge UPS batteries during high activity (once per day) Recharge during high (once per day)

Centralized UPS 6

Used in most small / medium data centers  Scales poorly  High losses in AC-DC-AC conversion (5-10%)  Centralized single point of failure, requires redundancy 

X

Increasingly cost-inefficient for large data centers

Distributed UPS 7

Used in large data centers  Scales with data center size  Avoids AC-DC-AC conversion  Distributed points of failure 

Facebook

Cheaper UPS solution

Google

Related work and our proposal 8

 Utility

 Diesel Generator

UPS



+ _

Centralized UPSs for power capping [Govindan, ISCA 2011] Distributed UPSs for rare power emergencies [Govindan, ASPLOS 2012] Our proposal: 

PDUs





 

Racks 

Provision distributed UPS for peak power capping Different battery technology Shave power on daily basis Place more servers under same power infrastructure

Better amortize capex costs

Outline 9

Introduction  Choosing the right battery for power shaving  Datacenter workload and power modeling  Policies and results  Conclusions 

Outline 10

Introduction  Choosing the right battery for power shaving  Datacenter workload and power modeling  Policies and results  Conclusions 

Competing Battery Technologies 11



Lead Acid (LA)



Lithium Cobalt Oxide (LCO)



Lithium Iron Phosphate (LFP)

Electric

Metrics 12

Backup  UPS batteries rarely used (3-4 times per year)  Proper metrics:  

Cost Size

Wh / $ Volumetric Density (Wh / liter)

Backup + peak shaving  UPS batteries used on daily basis  Proper metrics:    

Charge cycles Cost Size Recharge speed

Wh * cycles / $ Volumetric Density (Wh / litre) ( % charge / hour)

Battery Technology Comparison 13

Backup: Lead Acid (cheaper) Backup+Peak Shaving: Lithium Iron Phosphate (cost effective)

Battery Capacity-Cost Estimation

Power

14

Peak Duration

E shaved Peak Reduction

Time

LFP

Lead Acid

Assumptions 15

Number of servers

28K

Server Type

Custom Sun Fire X4270 - Intel Xeon (8-core), 8 GB Mem. - Idle Power: 175W - Max Power: 350W

PSU efficiency

80%

Workload

Pulse Model, utilization 50%

Batteries

LFP (5$/Ah), LA (2$/Ah)

TCO savings with peak duration 16

LFP

LFP size constraint

LA

LA size constraint

LA

The more we shave, the more we gain! LFP more space,energy efficient than LA, can shave more!

TCO savings with battery DoD 17



When shaving same energy:

Low DoD

High DoD

+ _

(a) LA

+ _

(b) LFP

Sweet DoD spot for TCO savings (LA: 40%, LFP: 60%)

Key points for battery selection 18

When using batteries for peak power shaving:  Shave as much power as possible (reasonably sized battery)  There is a DoD sweet spot, maximizing TCO savings  LFP better technology because:   



lots of recharges more efficient discharge higher energy density cheaper in the future

What if: - Servers with unbalanced load? - Day-to-day variation in demand?

Outline 19

Introduction  Choosing the right battery for power shaving  Datacenter workload and power modeling  Policies and results  Conclusions 

Workload Modeling 20

Whole year traffic data from Google Transparency Report  Apply weights according to web presence: (Search 29.2%, Social Networking 55.8%, Map Reduce 15%)  Present results for 3 worst consecutive days (11/17/2010-11/19/2010) 

Workload Modeling (cont.) 21

 

Model 1000 machine cluster, with 5 PDUs, 10 racks per PDU, 20 servers (2u) per rack. We simulate load based on M/M/8 queues and scale inter-arrival time according to workload traffic Interarrival Time Job Job Job

Job Job Job Job

Job Job Job

Job Job Job Job

8 Cores (consumers)/ Server

Job Job Job Job Job

Scheduler (Round Robin or Load-aware)

…….. Job

Service Time

Outline 22

Introduction  Choosing the right battery for power shaving  Datacenter workload and power modeling  Policies and results  Conclusions 

Policy goals 23

Guarantee power budget at specific level of power hierarchy  Discharge during only high activity, charge during only low activity  Effective irrespective of job scheduling  Make uniform battery usage 

Uncoordinated Policy 24

Power over Threshold

Recharge Complete

Available

In Use

Power below Threshold

Recharge

Reached DoD Goal

Applied at the server level  Easy to implement  Runs independently per server  DoD goal set to 60% of battery capacity (LFP) 

Not Available

(Power + Bat. Recharge Power) below Threshold

Uncoordinated Policy Results 25

Round Robin Scheduling 

Batteries discharge when not required



Batteries recharge during peak



Fails to guarantee budget

Budget violation

Uncoordinated Policy Results (cont.) 26

Load-aware Scheduling 

Batteries discharge all together (wasteful)



Recharge all together (violates budget)



Fails to guarantee budget

Coordination is required!!

Budget violation

Coordinated Control 27

Applied at higher levels (PDU, Cluster)  Requires remote battery enable/disable, initiate recharge  Number of batteries enabled proportional to peak magnitude  Batteries used spatially distributed

Overall Power



300 server 100 server equivalent equivalent 200 server 200 server equivalent equivalent 0 server equivalent

Day1

Day2

rack1

Day3

rack2

Coordinated Policies 28

Pdu-level

Cluster-level

Power cap close to Average power (ideal) of 250W Peak power reduction of 19%  23% more servers  6.2% TCO/server reduction

Discussion: Energy proportionality 29

Modern Servers

Sharper, thinner peaks  We can shave more power, with same stored energy 

Overall Power

Energy Proporional Servers

Day1

Day2

Day3

Peak power reduction of up to 37.5% with the 40Ah LFP battery

Concluding remarks 30

Battery provisioning of distributed UPS topologies to cap power and oversubscribe data center is beneficial  Critical to reconsider battery properties (technology, capacity, DoD)  Coordination of charges and discharges is required  We cap peak power by 19%, allow 23% more servers and better amortize capex costs  Achieve 6.2% reduction in TCO/server ($15M -- 28k server DC) 

31

BACKUP SLIDES

TCO savings with battery cost 32

 

LA is stable technology LFP advancements expected, due to electric vehicles

TCO savings increase over time with LFP!

When things go wrong? 33



Scenario 1: Unexpected daily traffic We use the additional 35% capacity in our batteries (DoD optimized for TCO savings at 60%)



Scenario 2: Batteries are not replaced immediately With 50% of batteries dead we can still reduce peak by 15% Grouping battery maintenance/replacement for cost savings possible

Exploration of Dead Batteries 34

Discussion: DVFS 35

To DVFS or not DVFS? Datacenter SLAs violations likely during peak load  DVFS bad during high demand  Great during low demand  Creates higher margins for aggressive battery capping 

Overall Power



Potential SLA violation

WITH No DVFS SLA violation unlikely Day1

Day2

Day3

Battery Capacity-Cost Estimation 36

E Datacenter,shaved = Power



PeakReduction * PeakDuration

Peak Duration

E shaved

Peak Reduction

Time

E server,shaved

= E Datacenter,shaved* PSUEff # servers



Cbattery

1 1 Eserver,shaved PE-1 *I * * = DoD 0.8 V



Cbattery *CostperAh * # servers UPSdepreciation = Min(servicelife, DoD(cycles) / 30)



LFP

Lead Acid (~twice volume)

Battery Related Assumptions 37

Workload partitioning 38

Distributed Algorithm 39

View more...

Comments

Copyright � 2017 NANOPDF Inc.
SUPPORT NANOPDF