Slides(PPT) - Storage Research Group

January 9, 2018 | Author: Anonymous | Category: Social Science, Political Science, Media
Share Embed Donate


Short Description

Download Slides(PPT) - Storage Research Group...

Description

Extending the Lifetime of Flash-based Storage through Reducing Write Amplification from File Systems

Youyou Lu, Jiwu Shu, Weimin Zheng

Tsinghua University

Outline • • • • •

Background and Motivation Object-based Flash Translation Layer System Co-design with Flash Memory Evaluation Conclusion

Flash Memory • Gained Popularity – High performance, low energy, reduced cost – Wide deployment: embedded devices -> enterprise servers

• Endurance – SLC (100,000) – MLC (10,000) – TLC (1,000) Figure from “The Bleak Future of NAND Flash Memory” in FAST’12

Existing Approach to Flash Endurance • Wear Leveling – To make all the blocks evenly worn out – Fundamental part of the FTL

• Data Reduction – To reduce the amount of data to be written – Data De-duplication and Compression – Used either in FTL or in FS

Write Amplification from File Systems • Write Amplification from File Systems – Pre-FS writes vs. Post-FS writes

• Journaling – Keep the journals in the logs first, – And then, checkpoint them in-place

• Metadata synchronization – Frequent persistence in case of data lost or inconsistency

• Page-aligned update – Wasted space within one page

A simple example in ext3 • Echo “title” > foo.txt – Effective Data: 6 bytes – Flash Writes: 11 pages * 4KB/page = 44KB

• Echo “texttexttext…”(4KB) >> foo.txt – Effective Data: 4KB – Flash Writes: 9 pages * 4KB/page = 36KB bmp data

bmp

inode

Data Area

data dirent

bmp data dirent C data inode

bmp bmp C

Journal Area

inode data

Flash Opportunities • No-overwrite – Can the journaling use it without writing twice in the file system?

• Page metadata – Can we store the backpointer to lazily write back the index while keeping consistency?

• Erase-before-Update – Can we track the free space in a coarse-grained way?

Outline • • • • •

Background and Motivation Object-based Flash Translation Layer System Co-design with Flash Memory Evaluation Conclusion

Architecture & Data Organization • Intelligent Storage Mgmt. • Optimized FS Mechanism Application VFS Syscalls File System File System (Namespace Mgmt. + Storage Mgmt.)

VFS Syscalls

Application

Namespace Mgmt. Object File System

OBJ_READ, OBJ_WRITE, OBJ_FLUSH, OBJ_CREATE, OBJ_DELETE, OBJ_SETATTR, OBJ_GETATTR

Storage Mgmt.

Object-based Flash Translation Layer (OFTL)

READ, WRITE

FLASH_READ, FLASH_WRITE, FLASH_ERASE

Flash Translation Layer (FTL) Solid State Drive

Hardware

Raw Flash Device

Hardware

OFTL Data Layout Root Page

Object Index

MAGIC NUMBER ADDR_BLK_INFO ADDR_ATIME_LOGS ADDR_OBJ_INDEX ADDR_UW_META

Block Info Meta Block Info Segment

Object Metadata File Metadata Layout Diff-Layout

Object Metadata File Metadata Layout Diff-Layout

Diff-Data

Diff-Data

Object Metadata Pages

Block Info Segment

Block Info Segment

Coarse-grained Block Lazy Indexing State Maintenance

Object Data Pages

Compacted Update

T1. Lazy Indexing • Index Metadata – The metadata that stores the pointers (the physical addresses of other pages).

• Object Index -> Object Metadata Page -> Object Data Page – Type-specific backpointers Page Metadata (OOB)

PAGE DATA

TYPE

BACK-POINTER (oid, offset, len)

Ver

Tid

Tcnt

...

ECC

Updating Window & Checkpointing Updating Window

Updating Window

1.Make sure the mappings are persistent 2.Write back the updating window metadata

• Transactional write – – Count the total number of the pages with the same tid, and compare with the stored tcnt Page Metadata (OOB)

PAGE DATA

TYPE

BACK-POINTER (oid, offset, len)

Ver

Tid

Tcnt

...

ECC

T2. Coarse-grained Block State Maintenance • Free space in flash block units – Page states can be identified using the block state • Pages in FREE blocks are all free • Pages in USED blocks are all used • Pages in UPDATING blocks need to be further identified

• Relaxed Metadata Persistence UPDATING w ) do n in atio c g W lo in (Al t da ing Up nd te Ex

FREE

Erase

U Ex p d te at nd in in g W g (E ind vic ow tio n)

USED

T3. Compacted Update • Compact multiple partial page updates into one flash page • Co-locate the diff-page with the metadata page Diff-Layout

Diff-Extent (3712, 384, 0)

o_off

Diff-Extent (4096, 768, 384)

len

addr

Diff-Extent Diff-Extent (8192, 896, 1152) (16384, 196, 2048)

Diff-Page 0

511

Diff-Data (0) Diff-Data (1) Diff-Data (2) Diff-Data (2) Diff-Data (3)

An Example in OFSS [Inode], [diff-data: “title”] [data: “title texttexttext…”] [dirent: “foo.txt”] [Inode], [diff-data: “xttext”]

OMP, , ver1, Tx1, 0, [ECC] ODP, , ver3, Tx2, 0, [ECC] ODP, , ver2, Tx1, 2, [ECC] OMP, , ver4, Tx2, 2, [ECC]

Updating Window

• Echo “title” > foo.txt • Echo “texttexttext…”(4KB) >> foo.txt

Comparison of Ext3 and OFSS • • • •

80 KB (ext3) -> 16 KB (OFSS) Journals => Transactional Metadata in Page Metadata Inode => Reverse Index in Page Metadata Block/Inode Bitmap => Free Space Mgmt. in Flash Block Units Page Un-aligned Update => Compaction and Co-location

Outline • • • • •

Background and Motivation Object-based Flash Translation Layer System Co-design with Flash Memory Evaluation Conclusion

Evaluation Method • Evaluation Metric – write amplification = flash_writes / app_writes

• System Evaluation Framework – OFSS – File system

RWC Trace Object File System

OFTL

RWC Trace

File System Block Trace Utility

Block Trace PFTL

Flash R/W/E

Flash R/W/E

Overall Comparison (1)

Write Amplification: OFSS = 15.1% * ext3 = 52.6% * ext2 = 10.6% * btrfs

Overall Comparison (2)

Write Amplification: OFSS = 36.0% * ext3 = 80.2% * ext2 = 51.0% * btrfs

Metadata Amplification In OFSS, meta Ext3 ampl. is Ext2 dramatically OFSS reduced Ext3: Journaling Ext2: Bitmap and Inode Table Refer to the paper for more details of the async mode

iPhoto

iPages

LASR1

LASR2

LASR3

TPCC

2.57 1.06 0.05

8.59 2.68 0.30

17.91 11.84 51.04 3.73 2.00 0.91 4.11 1.04 1.13 0.45 1.05 0.03

Flash Page Size Impact

• Write amplification gets worse and worse as the flash page size increases • The sync mode is much more worse than the async Refer to the paper for more details of the async mode

Outline • • • • •

Background and Motivation Object-based Flash Translation Layer System Co-design with Flash Memory Evaluation Conclusion

Conclusion • Metadata in file systems are frequently written back for consistency and durability, amplifying the writes to the flash memory • Flash memory offers opportunities for endurance-aware file system mechanisms – Journaling: transactional write – Metadata Synchronization: lazy indexing, coarsegrained block state maintenance – Page-aligned Update: compacted update

Thanks! Questions? Youyou Lu, Jiwu Shu, Weimin Zheng ([email protected]) http://storage.cs.tsinghua.edu.cn/~lu

Backup – ext3 layout

Block Group

Super Block

Group Desp.

Block Group

Inode Bitmap

Block Bitmap

Block Group

Inode Table

Data Blocks

Backup – Metadata Amplification (async)

Backup – Impact of Flash Page Size (async)

Backup – Overhead of Window Extending

View more...

Comments

Copyright � 2017 NANOPDF Inc.
SUPPORT NANOPDF