PPT - SSTD 2011
Short Description
Download PPT - SSTD 2011...
Description
Spatial data management over flash memory Ioannis Koltsidas and Stratis D. Viglas SSTD 2011, Minneapolis, MN
Flash: a disruptive technology
Orders of magnitude better performance than HDD Low power consumption Dropping prices Idea: throw away HDDs and replace everything with flash SSDs
Not enough capacity Not enough money to buy the not-enough-capacity
However, flash technology is somewhat enforced
Mobile devices Low-power data centers and clusters
Potentially all application areas dealing with spatial data We must seamlessly integrate Flash into the storage hierarchy
2
Need custom, flash-aware solutions Koltsidas and Viglas, SSTD 2011
Outline Flash-based device design
Flash memory Solid state drives
Spatial data challenges
3
Taking advantage of asymmetry Storage and indexing Buffering and caching
Koltsidas and Viglas, SSTD 2011
Flash cell: a floating gate transistor
Float gate
Control Gate
Oxide Layer
Two states: float gate charged or not (‘0’ or ‘1’)
N
The charge changes the threshold voltage (VT) of the cell
Source
To read: apply a voltage between possible VT values
the MOSFET channel conducts (‘1’)
or, it remains insulating (‘0’)
After a number of program/erase cycles, the oxide wears out
Oxide Layer
N Drain
P P-Type Silicon Substrate
• •
Single-Level-Cell (SLC): one bit per cell Multi-Level-Cell (MLC): two or more bits per cell o o
4
Control Gate Float Gate
Electrons get trapped in the float gate
Source Line
Bit Line
Flash memory cells
The cell can sense the amount of current flow Programming takes longer, puts more strain on the oxide Koltsidas and Viglas, SSTD 2011
Flash memory arrays NOR or NAND flash depending on how the cells are connected form arrays Flash page: the unit of read / program operations (typically 2kB – 8kB) Flash block: the unit of erase operations (typically 32 – 128 pages)
Before a page can be re-programmed, the whole block has to be erased first
Reading much faster than writing a page
It takes some time before the cell charge reaches a stable state
Erasing takes two orders of magnitude more time than reading
5
Consumer MLC (cMLC)
Enterprise MLC (eMLC)
SLC
Page Read (μsec)
50
50
25
Page Program (μsec)
900
1500
250
Block Erase (μsec)
2000-5000
2000-5000
1500-2000
Endurance (P/E cycles)
~3K-5K
~30K
~100K Koltsidas and Viglas, SSTD 2011
Flash-based Solid State Drives (SSDs) Common I/O interface
Block-addressable interface
No mechanical latency
Access latency independent of the access pattern 30 to 50 times more efficient in IOPS/$ per GB than HDDs
Read/write asymmetry
Reads are faster than writes Erase-before-write limitation
Limited endurance and the need for wear leveling
5 year warranty for enterprise SSDs (assuming 10 complete re-writes per day)
Energy efficiency
100 – 200 times more efficient than HDDs in IOPS / Watt
Physical properties
6
Resistance to extreme shock, vibration, temperature, altitude Near-instant start-up time Koltsidas and Viglas, SSTD 2011
SSD challenges Host interface
Flash memory: read_flash_page, program_flash_page, erase_flash_block Typical block device interface: read_sector, write_sector
Writes in place would kill performance, lifetime Solution: perform writes out-of-place
Amortize block erasures over many write operations Writes go to spare, erased blocks; old pages are invalidated Device logical block address (LBA) space ≠ physical block address (PBA) space
Flash Translation Layer (FTL)
Address translation (logical-to-physical mapping) Garbage collection (block reclamation) Wear-leveling logical page
LBA space device level
Flash Translation Layer
flash chip level
PBA space flash page 7
flash block
spare capacity Koltsidas and Viglas, SSTD 2011
15k RPM SAS HDD: ~250-300 IOPS 7.2k RPM SATA HDD: ~80 IOPS
Off-the-shelf SSDs Form Factor
A
B
C
D
E
PATA Drive
SATA Drive
SATA Drive
SAS Drive
PCI-e card
Consumer
Consumer
Consumer
Enterprise
Enterprise
Flash Chips
MLC
MLC
MLC
SLC
SLC
Capacity
32 GB
100 GB
160GB
140GB
450 GB
Read Bandwidth
53 MB/s
285 MB/s
250 MB/s
220 MB/s
700 MB/s
Write Bandwidth
28 MB/s
250 MB/s
100 MB/s
115 MB/s
500 MB/s
Random 4kB Read IOPS
~ 1 order of magnitude
3.5k
30k
35k
45k
140k
> 2 orders of magnitude
Random 4kB Write IOPS
0.01k
10k
0.6k
16k
70k
Street Price:
~ 15 $/GB (2007)
~ 4 $/GB (2010)
~ 2.5 $/GB (2010)
~18 $/GB (2011)
~ 38 $/GB (2009)
8
Koltsidas and Viglas, SSTD 2011
Work so far: better FTL algorithms Hide the complexity from the user by adding intelligence at the controller level
Great! (for the majority of user-level applications)
But as is usually the case, you can’t have a one-size-fits-all solution Data management applications have a much better understanding of access patterns
9
File systems don’t Spatial data management has even specific needs
Koltsidas and Viglas, SSTD 2011
Competing goals
SSD designers assume a generic filesystem above the device Goals:
Hide the complexities of flash memory
Improve performance for generic workloads and I/O patterns
Protect their competitive advantage, by hiding algorithm and implementation details
DBMS designers have full control of the I/O issued to the device Goals:
Predictability for I/O operations, independence of hardware specifics
Clear characterization of I/O patterns
Exploit synergies between query processing and flash memory properties
10
Koltsidas and Viglas, SSTD 2011
A (modest) proposal for areas to focus on
Data structure level
Ways of helping the FTL Introduce imbalance to tree structures Trade (cheap) reads for (expensive) writes
Memory management
11
Add spatial intelligence to the buffer pool Take advantage of work on spatial trajectory prediction Combine with cost-based replacement Prefetch data, delay expensive writes
Koltsidas and Viglas, SSTD 2011
Turning asymmetry into an advantage
Common characteristic of all SSDs: low random read latency Write speed and throughput differ dramatically across types of device
Sometimes write speed is orders of magnitude slower than read speed
Key idea: if we don’t need to write, then we shouldn’t
12
Procrastination might pay off in the long term Only write if the cost has been expensed
Koltsidas and Viglas, SSTD 2011
Read/write asymmetry
Consider the case where writes are x times more expensive than reads
This means that for each write we avoid, we “gain” x time units
Take any R-tree structure and introduce controlled imbalance
Rebalance when we have expensed the cost balanced insertion
original setup
parent
parent
parent
overflowing child
overflowing child
overflowing child
newly allocated sibling
unbalanced insertion
overflow area 13
Koltsidas and Viglas, SSTD 2011
In more detail
Parent P Overflowing node L On overflow, allocate overflow node S
P
L
Instead of performing three writes (nodes P, L, and S), we perform two (nodes L and S) We have saved 2x time units
P
only L and S nodes are written, not P
Record at L a counter c
14
Increment each time we traverse L to get to S Once counter reaches x, rebalance The cost has been expensed
L
c S
P
rebalance when c > x L
c
S Koltsidas and Viglas, SSTD 2011
Observations
If there are no “hotspots” in the R-tree then we have potentially huge gains
Method is applicable either at the leaves, or at the index nodes
Likelihood of rebalancing proportional to the level the imbalance was introduced (i.e., the deeper the level of imbalance, the higher the likelihood)
Good fit to data access patterns in location-aware spatial services
Counter-intuitive: the more imbalance, the lower the I/O cost In the worst case, as good as a balanced tree
Update rate is relatively low; point queries are highly volatile as users move about an area
Extensions in hybrid server-oriented configurations
15
Both HDDs and SSDs are used for persistent storage Write-intensive (and potentially unbalanced) nodes placed on the HDD Koltsidas and Viglas, SSTD 2011
Cost-based replacement
Choice of victim depends on probability of reference (as usual) But the eviction cost is not uniform
It doesn’t hurt if we misestimate the heat of a page
Clean pages bear no write cost, dirty pages result in a write I/O asymmetry: writes more expensive than reads So long as we save (expensive) writes
Key idea: combine LRU-based replacement with costbased algorithms
16
Applicable both in SSD-only as well as hybrid systems
Koltsidas and Viglas, SSTD 2011
In more detail
Starting point: cost-based page replacement Divide the buffer pool into two regions
Evict from time region to cost region Final victim is always from the cost region 17
Cost region
cost
Time region: typical LRU Multiple LRU queues, one per cost class Order queues based on cost
Time region
Koltsidas and Viglas, SSTD 2011
Location-awareness
Host of work in wireless networks dealing with trajectory prediction Consider the case where services are offered based on user location
Primary data are stored in an R-tree User location triggers queries on the R-tree
User motion creates hotspots (more precisely, hot paths) on the tree structure 18
Koltsidas and Viglas, SSTD 2011
Location-aware buffer pool management
What if the classes of the cost segment track user motion?
The lower the utility of the page being in the buffer pool, the higher the eviction cost Utility correlated with motion trajectory As the user moves about an area new pages are brought in the buffer pool and older pages are evicted
Potentially huge savings if trajectory is tracked accurately enough Flashmobs (pun intended!)
19
Users tend to move in sets into areas of interest Overall response time of the system minimized Recency/frequency of access may not be able to predict future behavior Trajectory tracking potentially will Koltsidas and Viglas, SSTD 2011
Conclusions and outlook
Flash memory and SSDs are becoming ubiquitous
Need for new data structures and algorithms
Existing ones target the memory-disk performance bottleneck That bottleneck is smaller with SSDs A new bottleneck has appeared: read/write asymmetry
Introduce imbalance at the data structure level
Both at the mobile device and at the enterprise levels
Trade reads for writes through the allocation of overflow nodes
Take cost into account when managing main memory 20
Cost-based replacement based on motion tracking and trajectory prediction Koltsidas and Viglas, SSTD 2011
View more...
Comments