登录查看更多内容

Benchmarking SSDs with Transparent Compression

Keith McKay

Flash Memory, SSD, Computational Storage

发布日期: 2022年10月24日

What is transparent compression??

Transparent or in-line compression acts like an "accelerator" for many workloads, which can seem counterintuitive. But it makes sense if you understand what's happening under the hood inside the SSD. Transparent compression compresses data before writes to Flash and decompresses it on reads without any host action. The host may not even know this is occurring within the SSD. Transparent compression eliminates many trade-offs encountered with CPU-based compression, particularly in high IOPS environments.?The basic premise is that by reducing the write activity to Flash, we achieve the following benefits:?

An increase in sustained random write IOPS?by lowering garbage collection overhead
Improvements in read latency in mixed read/write workloads by lowering write-to-read interference?
Improved endurance by reducing the amount of data written?

How does transparent compression work??

It doesn't take too much data compressibility to achieve significant gains in performance and endurance. We see that beyond 2:1 compressibility, there are diminishing returns as host data physically occupies less than half the available media such that garbage collection can function with little to no data movement. Another outcome of compression is the effective increase of additional over-provisioning (OP) to the SSD. The market revolves around 7% OP "read intensive" and 28% OP "write intensive" capacities (e.g., 3.2TB vs. 3.84TB) that differ by just 20% free space, so a 1.2:1 compression ratio can turn a "read intensive" drive into a "write intensive" one. Coincidentally, we can typically further compress LZ4 (or Snappy) compressed data by about 20% (through our Huffman coding stage). ?

What about the capacity of ScaleFlux SSDs??

If the data is compressible, we can give the host back extra capacity. We do this through the NVMe Thin Provisioning feature set, where we can set the namespace size larger than the physical capacity supporting it. The physical utilization is reported in the namespace usage field and is monitored by the host. This is not an either/or trade-off with the performance uplift. Let's say data is, on average, 2:1 compressible; then, we can extend capacity from 3.84TB to 6.2TB and maintain 28% OP performance levels.?

Focus on mixed read/write workloads and latency?

When benchmarking a drive with transparent compression, we look at workloads with mixed reads and writes. We evaluate performance based on latency and IOPS results for data stored at different compression levels falling between incompressible and up to about 2.5:1 compressed (any further compressibility should be applied to the extension of capacity).?

For example, we have a test where we steadily increase the number of random writes while maximizing read performance:

Value Smart Trading Limited (VSTL) 5 个月前

FinTech: Optimize the Data Pipeline to Process More…

JB Baker 10 个月前

Will flash send the spinning disk to an early grave?

Jacques Roux 8 年前

A chart showing the read IOPS response with increasing write IOPS

Looking at the chart above, ScaleFlux is red, and another Gen4 SSD is blue. The solid lines are "Read IOPS," and the dashed lines are "Achieved write IOPS." As the attempted write IOPS increases, our read performance is much less affected, and we can also continue to scale write IOPS linearly.?

How does a drive compare if your workload can't take advantage of compression? Data will bypass compression if it is not compressible, making the SSD performance competitive with any leading Gen4 NVMe SSDs on the market. In other words, data compressibility is all upside.?

Testing best practices?

One thing you may want to do while testing our drive's capacity is to download the upstream version of nvme-cli. It has the latest version of our plugin, which can easily report our write amplification and overall compression ratio on the SSD (nvme sfx /dev/nvme… smart-log-add). https://github.com/linux-nvme/nvme-cli?

Lastly, our FIO environment is on GitHub. It has some tests designed to highlight the performance advantages (including the data for the above plot). The README has a table mapping FIO settings to compression ratio (as achieved by the drive). https://github.com/kpmckay/fio-scripts?

Please do not hesitate to contact me on LinkedIn or email our team at [email protected] to ask for a drive to test.??

Cheers!

Keith