All data in one platform?

All data in one platform?

Could you store all your the data in just one platform? While this model does often get used in organizations, it is far from ideal. There is a lot of confusion about the storage of types of data that have different structures. Often an organization does not notice this until the storage environment becomes sluggish due to storage media refilling. Often stored and collected data is different in structure which can have heavy implications for retrievability for reuse, storage capacity and cost aspects.

We often see organizations using a "one Cloud fits for all data" approach, first and foremost asking whether all data may be routed to Cloud. Simplicity serves the organization is then the first thought but this is certainly not the most efficient solution. Below first the different data types.

Structured data.

Structured data are data that are predefined. This makes the data easily findable by other systems and thus this data is also more analyzable. The data is noted to an appropriate table format that defines the relationship between storage structures. Examples of data include SQL database or an Excel file built from the familiar rows and columns. Data from ERP and CRM systems are also structured. Storage methodologies are usually costly.

Unstructured data.

Are data that are much less easily captured in a definition think of photo files, movies or data coming from machine language (IoT). Unstructured data should be seen as food for our databases. Stored formulas, source codes keep us from reinventing the wheel each time. These storage methods are generally very cost efficient.

And it is precisely this data that deserves (storage) attention. Unstructured data currently make up 80% of our average data size, and they are growing the fastest, are much less compressible and difficult to find in "normal" storage systems not specially designed for unstructured data environments.

Knowing this, an optimally scalable storage infrastructure is desirable, however (affordable) scalability within the average storage infrastructures whether in a Cloud or on-premise is far from it.

So how?

The answer is fairly simple, the solution is not very complex but leaves data in the storage structures where it is best. Structured data in a the well known hierarchical storage Block and File systems and unstructured data in scalable modern storage models like Object storage. You can read how this works in: https://dutchitchannel.nl/702217/object-storage-beheerst-data-en-it-budgetten.html

But then we are not quite there yet, because we would only get along well if our stored data could actually be found regardless of the protocol in which it was done. The solution, utilize a:

Data management system.

We have just read that the composition of data (structured and unstructured) should be stored across different storage protocols in order to better serve their respective purposes. For several years now, we have been able to greatly increase data discoverability through the use of data management systems. Barriers that one might expect due to the data mix are thereby eliminated.

Our data is not only findable, shared with multiple users, regardless of location(Global) stored on multiple protocols (silo`s) and storage media stored while the use of "smart tools" such as analytical or computational tools that we can use from different Cloud providers via restful API is permanently or temporarily possible.

When asked if this could work as a data platform, the answer is a resounding yes. And whether this is on (your own) physical hardware or Cloud environment is much less important. Flexibility and convenience serves people.

Harold Koenders

要查看或添加评论,请登录

Harold Koenders的更多文章

社区洞察

其他会员也浏览了