Benchmarking SSDs with Transparent Compression
pexels.com (CC0)

Benchmarking SSDs with Transparent Compression

What is transparent compression??

Transparent or in-line compression acts like an "accelerator" for many workloads, which can seem counterintuitive. But it makes sense if you understand what's happening under the hood inside the SSD. Transparent compression compresses data before writes to Flash and decompresses it on reads without any host action. The host may not even know this is occurring within the SSD. Transparent compression eliminates many trade-offs encountered with CPU-based compression, particularly in high IOPS environments.?The basic premise is that by reducing the write activity to Flash, we achieve the following benefits:?

  1. An increase in sustained random write IOPS?by lowering garbage collection overhead
  2. Improvements in read latency in mixed read/write workloads by lowering write-to-read interference?
  3. Improved endurance by reducing the amount of data written?

How does transparent compression work??

It doesn't take too much data compressibility to achieve significant gains in performance and endurance. We see that beyond 2:1 compressibility, there are diminishing returns as host data physically occupies less than half the available media such that garbage collection can function with little to no data movement. Another outcome of compression is the effective increase of additional over-provisioning (OP) to the SSD. The market revolves around 7% OP "read intensive" and 28% OP "write intensive" capacities (e.g., 3.2TB vs. 3.84TB) that differ by just 20% free space, so a 1.2:1 compression ratio can turn a "read intensive" drive into a "write intensive" one. Coincidentally, we can typically further compress LZ4 (or Snappy) compressed data by about 20% (through our Huffman coding stage). ?

What about the capacity of ScaleFlux SSDs??

If the data is compressible, we can give the host back extra capacity. We do this through the NVMe Thin Provisioning feature set, where we can set the namespace size larger than the physical capacity supporting it. The physical utilization is reported in the namespace usage field and is monitored by the host. This is not an either/or trade-off with the performance uplift. Let's say data is, on average, 2:1 compressible; then, we can extend capacity from 3.84TB to 6.2TB and maintain 28% OP performance levels.?

Focus on mixed read/write workloads and latency?

When benchmarking a drive with transparent compression, we look at workloads with mixed reads and writes. We evaluate performance based on latency and IOPS results for data stored at different compression levels falling between incompressible and up to about 2.5:1 compressed (any further compressibility should be applied to the extension of capacity).?

For example, we have a test where we steadily increase the number of random writes while maximizing read performance:

A chart showing the read IOPS response with increasing write IOPS

Looking at the chart above, ScaleFlux is red, and another Gen4 SSD is blue. The solid lines are "Read IOPS," and the dashed lines are "Achieved write IOPS." As the attempted write IOPS increases, our read performance is much less affected, and we can also continue to scale write IOPS linearly.?

How does a drive compare if your workload can't take advantage of compression? Data will bypass compression if it is not compressible, making the SSD performance competitive with any leading Gen4 NVMe SSDs on the market. In other words, data compressibility is all upside.?

Testing best practices?

One thing you may want to do while testing our drive's capacity is to download the upstream version of nvme-cli. It has the latest version of our plugin, which can easily report our write amplification and overall compression ratio on the SSD (nvme sfx /dev/nvme… smart-log-add). https://github.com/linux-nvme/nvme-cli?

Lastly, our FIO environment is on GitHub. It has some tests designed to highlight the performance advantages (including the data for the above plot). The README has a table mapping FIO settings to compression ratio (as achieved by the drive). https://github.com/kpmckay/fio-scripts?

Please do not hesitate to contact me on LinkedIn or email our team at [email protected] to ask for a drive to test.??

Cheers!

Keith

Nice job! Good luck to ScaleFlux; doing some nice work there, apparently!

回复
Brandon Barrett

Account Executive @ Fortra | Helping Enterprises Secure their Data | FIM Industry Leader | SCM | SIEM | Endpoint Security | Managed Services

2 年

Brilliant article Keith McKay

回复
Wei Huang

Digital Marketing/Design thinking/Innovation Mgmt

2 年

Great work! I am soooo gonna translate/spread it ??

Robert Wilson

Hyperscale & Datacenter @ Blaize? | Performance where it matters.

2 年

Nice work Keith!!! Very easy to understand. ??

回复
JB Baker

Technology Business Leader focused on Product Management & Marketing, Strategy, Business Development, and Mentoring

2 年

Great explanation Keith!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了