Google still loves disks, should you?
Last month at the USENIX Conference on File and Storage Technologies (FAST), Google’s Eric Brewer gave a really interesting keynote talk (with an associated paper) on spinning disks and their role in Google’s datacenters. Brewer’s presentation drove at two main points:
- Disks are not dead. Brewer and his team believe that the cost of disk relative to flash dictates that Google will continue to actively use disk in its datacenters for years to come.
- Disks need to change. The disks we use today are more appropriate as single drives installed in a desktop computer than they are as members of a collection of millions of drives in a large-scale datacenter. The Google paper proposes a set of pretty dramatic changes to disks to make them more effective for cloud storage.
Google operates at a scale that is inherently exciting to any sensible technologist. As a result, when they tell us stories about their infrastructure, there is a natural tendency to want to draw lessons that can be applied to our own environments. Are the observations that Google makes about hard disks equally applicable to enterprise storage?
Is your data like Google’s Data?
Large-scale datacenter companies like Google and Facebook store massive amounts of data. The example used in Brewer’s FAST keynote is that Google’s daily ingest rate for Youtube videos is a petabyte per day.
A petabyte per day.
This is an important starting point for understanding where Google is coming from: the capacity of their system is growing at breakneck speed and there is obviously a very significant win to be had in ensuring that that the storage capacity they are building is as inexpensive as possible.
But Google’s case for disks is about more than just capacity. It’s important to think about the workload requirements that are being put on those spindles. While the exact usage patterns for the massive amount of video data stored on YouTube are not public, Facebook characterized how different types of media objects, such as pictures and videos, are accessed in f4, a similarly high-capacity disk-based storage system for “warm” data.
What is warm data? Facebook found that images and videos show exponential decay in access frequency as they age: Request rates to a given object drop by an order of magnitude in less than a week and two orders of magnitude in a couple of months. This is pretty intuitive: older content becomes less popular in a hurry and so is viewed much less as it ages.
There are two additional really interesting properties of this type of data:
- Tape wouldn’t help. Even though content is accessed less and less as it ages, when it is accessed it needs to be available in a responsive way -- this is precisely why this data is described as warm as opposed to cold. Tape-based archiving would be very inexpensive, but access times in minutes (or more) make it a non-starter for data that needs to be served on a web page.
- The law of large numbers ensures predictable and low-rate demand. Grouping together large amounts of very infrequently accessed data leads to access patterns that are very predictable, and predictably low. f4, for example, is designed to expect a peak request rate of only 20 IOPS per TB of data stored.
So bringing the consideration of Google and Facebook’s data back to an enterprise context, it’s worth realizing that these cloud application providers have a large and growing volume of very infrequently accessed data, and that they are able to engineer large-scale storage systems with millions of drives, to spread that work over. It’s also important to remember that even in this application domain, that disks are only one component in what is typically a larger storage system.
In our experience, most enterprises do not benefit from either the volume or the uniformity of warm data that applications like YouTube and Facebook get to design for. Here’s a bit of an over-simplification: if you have a bunch of files, and each of them needs 20 IOPS, and you're Google, no problem: put each of them on its own disk. But if you try to do that in your business, you're going to own a lot of underused disks. Moreover, enterprise data is more diverse. Instead of a single large-scale application like YouTube that houses petabytes of primarily read-only data, most enterprise environments host a huge variety of relatively smaller (than YouTube) apps. The diversity and unpredictability of these workloads, and the relatively smaller size of even large-scale enterprise deployments continue to make disks challenging to work with.
Poor performance is a distraction
While flash drives offer an at least 100x throughput increase for random workloads compared to disks, warm data does not intuitively benefit. This is because, at least in use cases like those at Google and Facebook, hard disks can easily satisfy the predictable and low request rates associated with warm data. The problem with disks however, arises in the face of unexpected events: The sheer physicality of drive mechanics -- rotational velocity, drive armature movement and so on -- means that the ceiling on drive performance is both low and fragile. In particular, spinning drives are very sensitive to the “IO Blender” of competing workloads: what if there is a sudden and unexpected hot spot of accesses to a very large volume of data that is two years old? What if the system has to perform recovery from a large batch of concurrently failing drives?
In contrast, flash reduces the penalty, and so also the risk, associated with these unexpected events. Not having to reason about drive mechanics simplifies both system design and operation. We all know that flash-based storage is fast, but its real value is in the simplicity that follows from not being unexpectedly slow. From the perspective of Coho’s customers, we have seen several examples over the past couple of years in which enterprises are moving to all-flash data centers -- literally instituting programs to replace all existing storage with flash-only hardware. In the words of one customer: “I don’t actually need 2000 IOPS per GB, I just want to stop having to diagnose and debug performance problems associated with spinning disks.”
Not having to decide up front
More than all of this though, the ability to randomly access data stored in flash results in a sea change in how we consider data itself. For decades, storage has been exactly that: a closet or an attic or a garage where infrequently used things can be placed, and where finding them again is likely to be frustratingly time consuming (and hence discouraged) at some point in the future. The throughput and random access capabilities of flash mean that we are free to go and analyze large amounts of warm data, with low effort, whenever we like.
One of the most compelling use cases that we have seen for enterprise all-flash systems has been the ability to add data analysis tasks on top of existing stored data. The fact that flash-based systems can stomach the multitenancy of things like adding new analytics to a storage system empowers organizations to explore their own data, to find new value, new opportunities, and new applications in it. Even before flash prices achieve the capacity cost of disk, this profound ability to actually analyze stored data may be worth paying a premium for.
This article was written with Mihir Nanavati.
The watch movements shown at the top of this article are from Guido Mocafico: Movement. On the left is a Chopard Perpetual Calendar with moonphase indicator, and on the right a Patek Philippe split-seconds chronograph movement.
CTO and Chief Scientist at Elastics.cloud
8 年There are so many applications not optimized for flash yet, so disks make sense at least for now.
Data Center Sales & Applications, Intel
8 年How about the new Optical disk storage? Will it have wide spread usage in these hyperscale datacenters? Thsi article came out today.: https://www.networkworld.com/article/3057227/sony-cranks-up-optical-disc-storage-to-33tb.html#tk.rss_datacenter
Senior SE @ VAST
8 年Agreed - Hybrid works for most IO uses cases - the very fast workloads requires the All Flash but that's it.