PPT - SSTD 2011

January 13, 2018 | Author: Anonymous | Category: Engineering & Technology, Electrical Engineering

Short Description

Download PPT - SSTD 2011...

Description

Spatial data management over flash memory Ioannis Koltsidas and Stratis D. Viglas SSTD 2011, Minneapolis, MN

Flash: a disruptive technology    

Orders of magnitude better performance than HDD Low power consumption Dropping prices Idea: throw away HDDs and replace everything with flash SSDs 



Not enough capacity Not enough money to buy the not-enough-capacity

However, flash technology is somewhat enforced



 

Mobile devices Low-power data centers and clusters

Potentially all application areas dealing with spatial data We must seamlessly integrate Flash into the storage hierarchy

 

 2

Need custom, flash-aware solutions Koltsidas and Viglas, SSTD 2011

Outline Flash-based device design



 

Flash memory Solid state drives

Spatial data challenges





 

3

Taking advantage of asymmetry Storage and indexing Buffering and caching

Koltsidas and Viglas, SSTD 2011



Flash cell: a floating gate transistor 

Float gate



Control Gate



Oxide Layer



Two states: float gate charged or not (‘0’ or ‘1’)

N

The charge changes the threshold voltage (VT) of the cell

Source



To read: apply a voltage between possible VT values 

the MOSFET channel conducts (‘1’)



or, it remains insulating (‘0’)

After a number of program/erase cycles, the oxide wears out

Oxide Layer

N Drain

P P-Type Silicon Substrate

• •

Single-Level-Cell (SLC): one bit per cell Multi-Level-Cell (MLC): two or more bits per cell o o

4

Control Gate Float Gate

Electrons get trapped in the float gate 



Source Line



Bit Line

Flash memory cells

The cell can sense the amount of current flow Programming takes longer, puts more strain on the oxide Koltsidas and Viglas, SSTD 2011

Flash memory arrays NOR or NAND flash depending on how the cells are connected form arrays Flash page: the unit of read / program operations (typically 2kB – 8kB) Flash block: the unit of erase operations (typically 32 – 128 pages)

  



Before a page can be re-programmed, the whole block has to be erased first

Reading much faster than writing a page





It takes some time before the cell charge reaches a stable state

Erasing takes two orders of magnitude more time than reading



5

Consumer MLC (cMLC)

Enterprise MLC (eMLC)

SLC

Page Read (μsec)

50

50

25

Page Program (μsec)

900

1500

250

Block Erase (μsec)

2000-5000

2000-5000

1500-2000

Endurance (P/E cycles)

~3K-5K

~30K

~100K Koltsidas and Viglas, SSTD 2011

Flash-based Solid State Drives (SSDs) Common I/O interface





Block-addressable interface

No mechanical latency



 

Access latency independent of the access pattern 30 to 50 times more efficient in IOPS/$ per GB than HDDs

Read/write asymmetry



 

Reads are faster than writes Erase-before-write limitation

Limited endurance and the need for wear leveling





5 year warranty for enterprise SSDs (assuming 10 complete re-writes per day)

Energy efficiency





100 – 200 times more efficient than HDDs in IOPS / Watt

Physical properties



 

6

Resistance to extreme shock, vibration, temperature, altitude Near-instant start-up time Koltsidas and Viglas, SSTD 2011

SSD challenges Host interface



 

Flash memory: read_flash_page, program_flash_page, erase_flash_block Typical block device interface: read_sector, write_sector

Writes in place would kill performance, lifetime Solution: perform writes out-of-place

 

  

Amortize block erasures over many write operations Writes go to spare, erased blocks; old pages are invalidated Device logical block address (LBA) space ≠ physical block address (PBA) space

Flash Translation Layer (FTL)



  

Address translation (logical-to-physical mapping) Garbage collection (block reclamation) Wear-leveling logical page

LBA space device level

Flash Translation Layer

flash chip level

PBA space flash page 7

flash block

spare capacity Koltsidas and Viglas, SSTD 2011

15k RPM SAS HDD: ~250-300 IOPS 7.2k RPM SATA HDD: ~80 IOPS

Off-the-shelf SSDs Form Factor

A

B

C

D

E

PATA Drive

SATA Drive

SATA Drive

SAS Drive

PCI-e card

Consumer

Consumer

Consumer

Enterprise

Enterprise

Flash Chips

MLC

MLC

MLC

SLC

SLC

Capacity

32 GB

100 GB

160GB

140GB

450 GB

Read Bandwidth

53 MB/s

285 MB/s

250 MB/s

220 MB/s

700 MB/s

Write Bandwidth

28 MB/s

250 MB/s

100 MB/s

115 MB/s

500 MB/s

Random 4kB Read IOPS

~ 1 order of magnitude

3.5k

30k

35k

45k

140k

> 2 orders of magnitude

Random 4kB Write IOPS

0.01k

10k

0.6k

16k

70k

Street Price:

~ 15 $/GB (2007)

~ 4 $/GB (2010)

~ 2.5 $/GB (2010)

~18 $/GB (2011)

~ 38 $/GB (2009)

8

Koltsidas and Viglas, SSTD 2011

Work so far: better FTL algorithms Hide the complexity from the user by adding intelligence at the controller level





Great! (for the majority of user-level applications)

But as is usually the case, you can’t have a one-size-fits-all solution Data management applications have a much better understanding of access patterns





 

9

File systems don’t Spatial data management has even specific needs

Koltsidas and Viglas, SSTD 2011

Competing goals 

SSD designers assume a generic filesystem above the device Goals:





Hide the complexities of flash memory



Improve performance for generic workloads and I/O patterns



Protect their competitive advantage, by hiding algorithm and implementation details

DBMS designers have full control of the I/O issued to the device Goals: 

Predictability for I/O operations, independence of hardware specifics



Clear characterization of I/O patterns



Exploit synergies between query processing and flash memory properties

10

Koltsidas and Viglas, SSTD 2011

A (modest) proposal for areas to focus on 

Data structure level   



Ways of helping the FTL Introduce imbalance to tree structures Trade (cheap) reads for (expensive) writes

Memory management    

11

Add spatial intelligence to the buffer pool Take advantage of work on spatial trajectory prediction Combine with cost-based replacement Prefetch data, delay expensive writes

Koltsidas and Viglas, SSTD 2011

Turning asymmetry into an advantage  

Common characteristic of all SSDs: low random read latency Write speed and throughput differ dramatically across types of device 



Sometimes write speed is orders of magnitude slower than read speed

Key idea: if we don’t need to write, then we shouldn’t  

12

Procrastination might pay off in the long term Only write if the cost has been expensed

Koltsidas and Viglas, SSTD 2011

Read/write asymmetry 

Consider the case where writes are x times more expensive than reads 



This means that for each write we avoid, we “gain” x time units

Take any R-tree structure and introduce controlled imbalance 

Rebalance when we have expensed the cost balanced insertion

original setup

parent

parent

parent

overflowing child

overflowing child

overflowing child

newly allocated sibling

unbalanced insertion

overflow area 13

Koltsidas and Viglas, SSTD 2011

In more detail   

Parent P Overflowing node L On overflow, allocate overflow node S 





P

L

Instead of performing three writes (nodes P, L, and S), we perform two (nodes L and S) We have saved 2x time units

P

only L and S nodes are written, not P

Record at L a counter c 

  14

Increment each time we traverse L to get to S Once counter reaches x, rebalance The cost has been expensed

L

c S

P

rebalance when c > x L

c

S Koltsidas and Viglas, SSTD 2011

Observations 

If there are no “hotspots” in the R-tree then we have potentially huge gains  



Method is applicable either at the leaves, or at the index nodes 

Likelihood of rebalancing proportional to the level the imbalance was introduced (i.e., the deeper the level of imbalance, the higher the likelihood)

Good fit to data access patterns in location-aware spatial services







Counter-intuitive: the more imbalance, the lower the I/O cost In the worst case, as good as a balanced tree

Update rate is relatively low; point queries are highly volatile as users move about an area

Extensions in hybrid server-oriented configurations  

15

Both HDDs and SSDs are used for persistent storage Write-intensive (and potentially unbalanced) nodes placed on the HDD Koltsidas and Viglas, SSTD 2011

Cost-based replacement  

Choice of victim depends on probability of reference (as usual) But the eviction cost is not uniform  



It doesn’t hurt if we misestimate the heat of a page 



Clean pages bear no write cost, dirty pages result in a write I/O asymmetry: writes more expensive than reads So long as we save (expensive) writes

Key idea: combine LRU-based replacement with costbased algorithms 

16

Applicable both in SSD-only as well as hybrid systems

Koltsidas and Viglas, SSTD 2011

In more detail  

Starting point: cost-based page replacement Divide the buffer pool into two regions   



Evict from time region to cost region Final victim is always from the cost region 17

Cost region

cost



Time region: typical LRU Multiple LRU queues, one per cost class Order queues based on cost

Time region

Koltsidas and Viglas, SSTD 2011

Location-awareness 



Host of work in wireless networks dealing with trajectory prediction Consider the case where services are offered based on user location  



Primary data are stored in an R-tree User location triggers queries on the R-tree

User motion creates hotspots (more precisely, hot paths) on the tree structure 18

Koltsidas and Viglas, SSTD 2011

Location-aware buffer pool management 

What if the classes of the cost segment track user motion?   

 

The lower the utility of the page being in the buffer pool, the higher the eviction cost Utility correlated with motion trajectory As the user moves about an area new pages are brought in the buffer pool and older pages are evicted

Potentially huge savings if trajectory is tracked accurately enough Flashmobs (pun intended!)  

  19

Users tend to move in sets into areas of interest Overall response time of the system minimized Recency/frequency of access may not be able to predict future behavior Trajectory tracking potentially will Koltsidas and Viglas, SSTD 2011

Conclusions and outlook 

Flash memory and SSDs are becoming ubiquitous 



Need for new data structures and algorithms  





Existing ones target the memory-disk performance bottleneck That bottleneck is smaller with SSDs A new bottleneck has appeared: read/write asymmetry

Introduce imbalance at the data structure level 



Both at the mobile device and at the enterprise levels

Trade reads for writes through the allocation of overflow nodes

Take cost into account when managing main memory  20

Cost-based replacement based on motion tracking and trajectory prediction Koltsidas and Viglas, SSTD 2011

PPT - SSTD 2011

Short Description

Description

Comments

We need your help!