登录查看更多内容

The Data Center of Future is Data Driven - Part 2

Matthew Hardman

Hybrid Cloud | High Performance Applications | Data Ops | Strategy | Leadership

发布日期: 2018年3月20日

Wow, it took forever to get back to part two, sorry about the delay, I hope the wait is worth it!

At the end of my last article here, we laid down the principle that the data we are dealing with in the enterprise consists of structured and unstructured data. Now while most of the data is largely unstructured data, organizations spend a high percentage of their budgets on structured data (Oracle, SQL Server, DB2).

Most of us today are very familiar with structured data, the data we store in databases, inside tables in rows and columns, along the lines of;

The way we got data from these tables was through writing Structured Query Language or SQL, something like...

"SELECT [Last Name] FROM Table WHERE [Primary Key] = 2"

The result of which would be "Anderson".

Now lets look at a piece of unstructured data, like an image...

You can't really run a similar query like you can on a table to get the information such as what a person's name is here, but this is the type of unstructured data that represents unrealized value and its the type of data that is growing exponentially. So the question is how do you make this data useful.

Well the answer is metadata. Metadata is data that describes data, its data that can be searchable or discoverable, it will help you identify the useful data you possess, and can help you turn it in to value. You could imaging the metadata that might be attached to this picture could look something like this...

[Operational Metadata]
  ImageType=Colour;
  DateTaken=23/01/2018;
  Resolution=640x480

[Custom Metadata]
  PeopleInPicture=4
  PeoplesNames=("Veronica", "Jeff", "Betty", "Paul")

If you had ten thousand pictures, you would then be able to query your data along the lines of...

"SELECT PICTURES WHERE PeoplesNames='Betty'"

Of course this is an oversimplification of how the ability to search would work, but you get the point, metadata becomes key to discovering your valuable information, and because the data is different, you need to think about a different way to handle it.

Data defines storage strategy

Data centers grew up based on the architectures of the data they stored, and that data pretty much came from structured systems. The workhorse for structured systems and data has been block storage, and has served the enterprise well delivering increasingly high performance and resiliency, however with the rapid growth of unstructured data, organizations are starting to see some challenges arise from trying to place unstructured data in to the structured systems that sit on block storage. A good example of this is SharePoint. Now dont get me wrong, SharePoint is a great application, promoting team collaboration, search and more, but there is one area that it is challenged in, and that is the way that unstructured data such as files are handled. Each file uploaded to a SharePoint library is stored as a BLOB in a database, so as more and more users start uploading, you are very quickly going to find that your database is going to fill up very quickly resulting in degraded performance and the need to license more databases to handle your file loads. There is a better way.

Object Storage platforms are revolutionizing the way we handle unstructured data, where files are just put in to a flat structure, there are no file hierachies or blocks to define as in other storage systems. The Object Storage Platform also manages the metadata associated with each object, which is key to finding the original data, as discussed earlier. Finally access to all of these objects can be done through standard industry protocols (which I wont go in to here, but just know they are used everywhere). A good example of a company who uses Object Storage, would be someone like Spotify, managing millions of songs cant be done via a file hierachy, it needs to be found and retrieved quickly, hence object storage and metadata wins out.

There are many types of data in our organizations today that will ultimately benfit from Object Storage technologies, think about user files, logs, images, call logs etc. These are all object types that contain data not easily found or discovered, but when we enrich those object with metadata, all of a suddent we can start to easily identify what you are looking for.

As a data company, understanding these challenges is something that we have been working with customers for a long time on, and in managing unstructured data, we have something called the Hitachi Content Platform (https://www.hitachivantara.com/en-us/products/cloud-object-platform/content-platform.html), its not just an Object Storage platform, but really a data intelligence platform that enables you to store your objects, manage the metadata, in addition to managing compliance of the objects for legal reasons etc. This type of a system will become the data repository for all the applications you will build in the future to drive insights or competitive advantage but aggregating all your information together, and enabling analytics to run on top of it all.

In closing, as you continue to modernize your data center, its critical to understand the types of data you will be managing today and in the future. Managing unstructured data is something you absolutely need to consider as the data being created in that space, is quickly outpacing the data we create in structured data systems.

The Data Center of Future is Data Driven - Part 2

Matthew Hardman

Hybrid Cloud | High Performance Applications | Data Ops | Strategy | Leadership

Data defines storage strategy

更多精彩文章

社区洞察

其他会员也浏览了

Data Platforms - An Outlook

What is a Big Data Engineer, and why your business needs one?

Meet Ultipa Manager: Easy Data Migration

Is Kimball Still Relevant in the Modern Data Warehouse?

Data Warehouse versus Data Lake versus Data Lakehouse: What's the difference and why does it matter?

SCD's in An Analytical Data Warehouse - A Pain/Advantage ?

Why Relational databases are losing charm and relevance?

Change Data Capture (CDC): Lessons Learned Building a Solution from Scratch

How Are Multinational Companies (MNCs) using Big Data?

Unlocking Snowflake: How Columnar Storage Transforms Data Management

Data defines storage strategy

The future is hybrid... but not that hybrid.

2023年4月4日

Are we missing the real value of employee development?

2023年2月22日

The big opportunity in the microcosm of user driven innovation

2022年7月11日

Excessive Explanation - A cry for help?

2021年8月27日

The decade of understanding and action

2020年1月9日

The cyclical nature of enrichment and analysis

2019年4月6日

Knowing what you don't know

2019年3月9日

Backups should be stored in the cloud

2019年2月23日

Building better performance goals

2018年10月13日

Daddy learning from Daughter

2018年8月6日

社区洞察

其他会员也浏览了

Data Platforms - An Outlook

What is a Big Data Engineer, and why your business needs one?

Meet Ultipa Manager: Easy Data Migration

Is Kimball Still Relevant in the Modern Data Warehouse?

Data Warehouse versus Data Lake versus Data Lakehouse: What's the difference and why does it matter?

SCD's in An Analytical Data Warehouse - A Pain/Advantage ?

Why Relational databases are losing charm and relevance?

Change Data Capture (CDC): Lessons Learned Building a Solution from Scratch

How Are Multinational Companies (MNCs) using Big Data?

Unlocking Snowflake: How Columnar Storage Transforms Data Management