The Data Center of Future is Data Driven - Part 2
Matthew Hardman
Hybrid Cloud | High Performance Applications | Data Ops | Strategy | Leadership
Wow, it took forever to get back to part two, sorry about the delay, I hope the wait is worth it!
At the end of my last article here, we laid down the principle that the data we are dealing with in the enterprise consists of structured and unstructured data. Now while most of the data is largely unstructured data, organizations spend a high percentage of their budgets on structured data (Oracle, SQL Server, DB2).
Most of us today are very familiar with structured data, the data we store in databases, inside tables in rows and columns, along the lines of;
The way we got data from these tables was through writing Structured Query Language or SQL, something like...
"SELECT [Last Name] FROM Table WHERE [Primary Key] = 2"
The result of which would be "Anderson".
Now lets look at a piece of unstructured data, like an image...
You can't really run a similar query like you can on a table to get the information such as what a person's name is here, but this is the type of unstructured data that represents unrealized value and its the type of data that is growing exponentially. So the question is how do you make this data useful.
Well the answer is metadata. Metadata is data that describes data, its data that can be searchable or discoverable, it will help you identify the useful data you possess, and can help you turn it in to value. You could imaging the metadata that might be attached to this picture could look something like this...
[Operational Metadata]
ImageType=Colour;
DateTaken=23/01/2018;
Resolution=640x480
[Custom Metadata]
PeopleInPicture=4
PeoplesNames=("Veronica", "Jeff", "Betty", "Paul")
If you had ten thousand pictures, you would then be able to query your data along the lines of...
"SELECT PICTURES WHERE PeoplesNames='Betty'"
Of course this is an oversimplification of how the ability to search would work, but you get the point, metadata becomes key to discovering your valuable information, and because the data is different, you need to think about a different way to handle it.
Data defines storage strategy
Data centers grew up based on the architectures of the data they stored, and that data pretty much came from structured systems. The workhorse for structured systems and data has been block storage, and has served the enterprise well delivering increasingly high performance and resiliency, however with the rapid growth of unstructured data, organizations are starting to see some challenges arise from trying to place unstructured data in to the structured systems that sit on block storage. A good example of this is SharePoint. Now dont get me wrong, SharePoint is a great application, promoting team collaboration, search and more, but there is one area that it is challenged in, and that is the way that unstructured data such as files are handled. Each file uploaded to a SharePoint library is stored as a BLOB in a database, so as more and more users start uploading, you are very quickly going to find that your database is going to fill up very quickly resulting in degraded performance and the need to license more databases to handle your file loads. There is a better way.
Object Storage platforms are revolutionizing the way we handle unstructured data, where files are just put in to a flat structure, there are no file hierachies or blocks to define as in other storage systems. The Object Storage Platform also manages the metadata associated with each object, which is key to finding the original data, as discussed earlier. Finally access to all of these objects can be done through standard industry protocols (which I wont go in to here, but just know they are used everywhere). A good example of a company who uses Object Storage, would be someone like Spotify, managing millions of songs cant be done via a file hierachy, it needs to be found and retrieved quickly, hence object storage and metadata wins out.
There are many types of data in our organizations today that will ultimately benfit from Object Storage technologies, think about user files, logs, images, call logs etc. These are all object types that contain data not easily found or discovered, but when we enrich those object with metadata, all of a suddent we can start to easily identify what you are looking for.
As a data company, understanding these challenges is something that we have been working with customers for a long time on, and in managing unstructured data, we have something called the Hitachi Content Platform (https://www.hitachivantara.com/en-us/products/cloud-object-platform/content-platform.html), its not just an Object Storage platform, but really a data intelligence platform that enables you to store your objects, manage the metadata, in addition to managing compliance of the objects for legal reasons etc. This type of a system will become the data repository for all the applications you will build in the future to drive insights or competitive advantage but aggregating all your information together, and enabling analytics to run on top of it all.
In closing, as you continue to modernize your data center, its critical to understand the types of data you will be managing today and in the future. Managing unstructured data is something you absolutely need to consider as the data being created in that space, is quickly outpacing the data we create in structured data systems.