登录查看更多内容

How Disks Became 150X Slower Since 1985

Sam Lightstone

Tech leadership @Airbnb

发布日期: 2015年7月14日

The sad truth is that computers have become slower than ever. In this post I'm going to explain why that's the case for disks, and prove it with facts. If your current personal computer or business server feels strangely a lot slower than what you recall from systems from 20 years ago that's because (Moore's Law be damned) it actually is. Unless you're in a cave on an island in the middle of the Indian Ocean, you haven't escaped the buzz and feed around Big Data. Vast amounts of data are collected and analyzed from every transaction, and increasingly from a wide range of devices (Internet of Things). The insights from this data will change the world, personalize our interactions, and help cure disease. Just when we thought we couldn't possibly create more data than the number of people on the planet, we've found news ways explode our data another order of magnitude by capturing data not only from people, but from a wide range of devices and companies. Then of course there is meta-data - the data that describes data itself.

All this data, Terabytes, Petabytes and Exabytes of it, surely requires vast physical resources. Even our personal computers and phones now require dozens if not hundreds of gigabytes to store the data we use daily.

Relax - disks are now so cheap and RAM so large, we are swimming in physical resource.

We're swimming in a sea of cheap plentiful random access storage - whether is it HDDs, SSDs, RAM or the emerging Storage Class Memory technologies. Or so goes the myth. Truth: It's a fantasy wrapped in a delusion. The harsh reality is that companies are buckling under the weight of this data, and the costs of IT infrastructure to contain and process it. While disk prices have fallen, the volumes have risen so quickly and uses for the data have expended to widely, that immense capital expenditures are needed for the associated RAM, and CPUs to examine this data and derive useful information. Here some sobering considerations on the specific issue of disk speeds:

Over the past 30 years disk drives have very impressively increased their capacity by 10,000X and I/O transfer rates by 65X. At first blush, it's amazing. Storage costs are now pennies per gigabyte, and with several orders of magnitude increase in both capacity and speed, we're rockin' it. There is a subtlety in this; hard to see but with dramatic consequences: The data capacity has grown far faster than the transfer rate. 150 times faster actually. Although the speed of the disks is now 65X faster, they individually hold 10,000X more data, so the time required to access the contents of the disk is actually 150X slower than it was 30 years ago. Oh, for the glory days of 1985!

[Disk Transfer Rates, image reused with permission from D. DeWitt, PASS Summit Keynote 2009]

If the total time to read disks has essentially gotten 150X worse, what about the price-performance of these disks? After all, storage prices have been dropping rapidly. In fact the cost per megabyte has dropped from about $30/MB in 1985 to roughly 40 cents per gigabyte today, or in other words, just fractions of a penny per megabyte. In other words, if the performance of disks is effectively 150X slower than it was 30 years ago, the price per MB is now about one million times cheaper. See the diagram below - note the disk prices are shows a "X" symbols, and the y-axis is logarithmic. So, indeed price-performance has improved. It's a sorry consolation that we can buy so much disk, when we have to wait 150X times longer to see what's on it.

(Source: here)

The next time you need to buy disks, give some thought to what you need to put there. These days, less may be more.

[In my next post I'll discuss CPU and RAM - are we further ahead?]

Louw F.

Data & Technology Transformation | Global Data & Analytics Lead

8 年

Thanks Sam for this clear and concise article. I agree with John that columnar storage technology available from SAP Hana, Microsoft SQL Server and IBM DB2 and the likes are a great leap forward and to your point; less is more. With Columnstore available in SQL Server since 2012 the "average Joe" now have the ability to increase the responsiveness of his \ her BI solution to customers without costing an arm and a leg. The sad reality is that few technologist utilize this capability effectively in solving business challenges.

Blair Carey, CFA

Creating Predictable Proven Revenue Processes Daily

9 年

Sam, Insightful analysis about the current state of Data Management. How do we deal with this? Is there a way for us to increase data access under existing technologies? or is there something new coming soon?

John Barton

9 年

Sam great analysis! Back when disks were 2 GB each it would take 512 disks to make up 1 TB so we automatically had the bandwidth. Today we have 1-2 disks, exactly your point. The other big issue is stalling the processing we have all these cores and the processing gets stalled. Big reason the benefits of columnar in-memory storage and processing are so significant. SAP IQ did a good job of processing columnar data pages were large for relational at 128k to 512k and packed with a single column of compressed data values but still the processor would need the next page of values and prefetching helped make sure the page was in database cache but still time for the cache page to get into the CPU cache (TLB). SAP HANA has the columns stored in contiguous memory in a compressed format and uses the Intel parallel vectorization libraries to go thru the data. Keeps the cores from stalling (get full quantum – time slice from scheduler) and plows thru the data at memory speeds. Progress is getting better for disk based databases using SSD's and Flash instead of using slow spinning disks but small io sizes (large number of iops) and row storage (CPU cache thrashing – parsing unnecessary columns) has a significant drag on performance. Having compressed column data stored and processed independently in-memory is a big leap forward.

查看更多评论

要查看或添加评论，请登录

Sam Lightstone的更多文章

What is engineering? The science of solving problems

2019年5月14日

What is engineering? The science of solving problems

Last year I taught an introductory course in robotics to middle school students. It was a great experience, a real…

3 条评论
How top athletes do everything backwards. We can learn from that.

2018年10月18日

How top athletes do everything backwards. We can learn from that.

I was never an Olympian. Or even close being one except by association.

5 条评论
Nine ways hybrid cloud delivers real IT and business benefit

2015年12月8日

Nine ways hybrid cloud delivers real IT and business benefit

Hybrid Cloud is a big buzzword lately, and it’s easy to run into the jargon every day if you work in high tech. Try…

6 条评论
Big News @BigBlue for Big Data in the Cloud

2015年7月27日

Big News @BigBlue for Big Data in the Cloud

[ With thanks to my co-author and fellow technology executive in the cloud, Adam Kocoloski ] Great people make awesome…

1 条评论
What I learned from Steve Jobs: When not to listen to your customers

2015年6月30日

What I learned from Steve Jobs: When not to listen to your customers

Apple co-founder Steve Jobs (February 24, 1955 – October 5, 2011) passed away 3 years, 8 months, and 25 days ago and…

5 条评论

See all articles

How Disks Became 150X Slower Since 1985

Sam Lightstone

Tech leadership @Airbnb

Sam Lightstone的更多文章

社区洞察

其他会员也浏览了

Understanding Maximum Flow Algorithms

Data Bus Guide – What You Need To Know | OurPCB

Intel and Aerospike Hot Data Webinars 7.12.2020

Memory mapping in DDR is essential for defining how data is stored, accessed, and managed in the physical address space of DRAM

Does time progress at the same rate everywhere?

PowerMax: “The NVMe final frontier … To boldly go where no Storage has gone before!”

Big Data or Big Brother?

The Democratisation of Data

Diamond Wafer Could Store As Much Data As 1 Billion Blu-ray Discs

Sam Lightstone的更多文章

What is engineering? The science of solving problems

How top athletes do everything backwards. We can learn from that.

Nine ways hybrid cloud delivers real IT and business benefit

Big News @BigBlue for Big Data in the Cloud

What I learned from Steve Jobs: When not to listen to your customers

社区洞察

其他会员也浏览了

Understanding Maximum Flow Algorithms

Data Bus Guide – What You Need To Know | OurPCB

Intel and Aerospike Hot Data Webinars 7.12.2020

Memory mapping in DDR is essential for defining how data is stored, accessed, and managed in the physical address space of DRAM

Does time progress at the same rate everywhere?

PowerMax: “The NVMe final frontier … To boldly go where no Storage has gone before!”

Big Data or Big Brother?

The Democratisation of Data

Diamond Wafer Could Store As Much Data As 1 Billion Blu-ray Discs