The Data Center of the Future is Data Driven - Part 1
Matthew Hardman
Hybrid Cloud | High Performance Applications | Data Ops | Strategy | Leadership
As part one of a two part article, I want to talk about the Data Driven Data Center, why its coming, and how customers need to think about designing it.
I remember when I used to work at VMware, when we introduced the concept of the Software Defined Data Center (or SDDC), I also remember how anyone would remember SDDC, got that wrong I guess. It was a pretty big game changer for us as employees, and more so for customers to accept, but there is no denying that SDDC is very real and here to stay. We aren't just talking about the data center you may run on premise either, its everything, private cloud, public cloud, hybrid cloud and on and on. While it used to just be software defined compute (vSphere, Hyper-V etc.), that people used, we now have software defined storage, software defined networking etc.
The technological leaps have been awesome for people in the IT side of the business, and the impact can be brilliant for the business, but lets be honest business outcomes have never really driven the need for a virtual SAN, or virtual Switch. What a business craves is insights, insights on its performance, its customers, its competition, its offerings etc. these insights are going to be driven by data. Most organisations are less concerned about the how its getting it or what its hosted on, but are focused more on the what data I am getting, and how fast and easy can I get it.
If you think like the business then, you realise that the need for data, and hence the data itself is going to be the core thing that will define and drive the investments your organization makes in the data center today and beyond. If you accept that, then the thing you do need to understand is what technologies will enable the best ingestion, retention and accessibility to your data, and to understand that, you need to understand that not all data is the same.
Structured vs Unstructured Data
Its been very easy to think about all data as 1s and 0s at the end of the day, and you know what, it can all be broken down in to that structure, however, its not entirely useful to you in that format. What is important to realise is that there are two main types of data today, and the foreseeable future;
- Structured Data: Data that exists in structures like tables in a database, with a formal and enforced format it must follow, maintained in columns and rows. Think about it in the way an Excel Spreadsheet works.
- Unstructured Data: Data that has no formalised structure, in that it cant be broken down in to rows and columns. Think about data like pictures, call logs, videos, scans etc.
Whats really interesting is that the share the ratio between the two is really unbalanced, and while you might think that the majority of data captured here is going to be structured because you know your organization is running and spending lots of money on things like Oracle, SQL Server, DB2 etc... let me surprise you.
First, you need to understand that data is growing fast... really fast, faster than we probably expect. How fast? Well consider this quote...
"Between the dawn of civilization and 2003, we only created five exabytes; now we're creating that amount every two days. By 2020, that figure is predicted to sit at 53 zettabytes (53 trillion gigabytes) -- an increase of 50 times." - Hal Varian, Chief Economist at Google
Where is all this data coming from... well let me give you a hint. Think about all those people out there today with mobile phones, what are they doing with them... pictures, chats, videos, poop emoticons... every interaction is generating data... and that is just the beginning, we are seeing more and more machinery producing thousands of points of data every second. All of this data unstructured. In fact the growth of data looks like this;
Source: Patrick Cheesman
Houston we have a problem... funnily enough, organisations have been spending millions of dollars building systems that manage structured data, while the growth of data has been happening in the unstructured part. If you looking to point to something that is driving this, well have a look at what Gartner is saying.
Data Source: https://www.gartner.com/newsroom/id/3598917
In fact what they see happening which aligns to our "Cambrian Explosion of Data" graph, is that;
"... from 2018 onwards, cross-industry devices, such as those targeted at smart buildings (including LED lighting, HVAC and physical security systems) will take the lead as connectivity is driven into higher-volume, lower cost devices."
All of these devices, are going to produce more and more data at shorter intervals. Even though a lot of these devices will be more than likely be producing data from a consumer point of view, ie smart devices etc, the reality as Gartner puts it, is businesses will spend more, and ultimately that spend will reach $3 trillion by 2020. Businesses will want to capture all and more of the data these devices create, and start utilising this data for analysis and insights to try and optimise existing processes, or realise new opportunities.
This need will drive the pressure on to IT, the relentless pursuit of data to uncover insights. So while our data centers of tomorrow, need to deal with what we have managed yesterday (structured data), they absolutely need to be ready to manage and discover insights from the deluge of data we about to uncover tomorrow (unstructured data).
In the next part, we will talk about actual systems that can be implemented, and the success that customers have had in getting on top of unstructured data.