The Economics of Data – Making sense of Computer Storage in a Cognitive Era

The Economics of Data – Making sense of Computer Storage in a Cognitive Era

Tyre changes were introduced in Formula One to encourage more overtaking; so, if STOPPING your business and giving the opposition a chance to overtake is okay with you, carry on doing things the old way. But if you run a NON-STOP business, want to be ready to take advantage of data in the cognitive era and consequently need to change your infrastructure dynamically while still delivering to your clients, then you need the economic advantages of software-defined vendor-neutral hardware-independent data optimisation. 

That Was Then, This Is Now

There was a time, not so long ago, when data storage was primarily the domain of large institutions, almost entirely held in structured form for computers to aggregate and then spit out on reams of paper for finance teams, sellers, marketeers, stock controllers and the like to analyse to the best of their ability. But that was THEN, and this is NOW.

Now is a time when most of the world’s data is unstructured – text, sound, images, video – being aggregated by a myriad of organisations with data feeds from literally billions of devices, much of it being analysed BY the computers to give additional insight to whoever owns the feed, or is willing to subscribe to it over the internet 

This hyper-growth in data collected, stored and processed, is usually referred to in quite emotive and even destructive terms. At various times you will see it referred to as a Tsunami, an Avalanche, an Explosion even – all fairly violent and thankfully rare events. Use of such terms misses the point somewhat. What we are experiencing, a relentless growth in data capacity, is actually not rare but ever present, is not an ‘imminent disaster’ but an early indicator of the market opportunity for leading businesses. As such they  need to seek to adapt and adopt working practices to cope – to transform into organisations fit for the 21st century.

The Data Economics Objective

The data economics objective therefore has to be to constantly seek to optimize and save on today’s infrastructure in order to invest in the necessary transformation to handle tomorrow’s workloads. Some might say this is analogous to changing the wheels on a moving car. Certainly moving from one platform to another has to be done while business continues to be transacted in this 24x7xforever world. To do this requires breaking out of the paradigm that has existed since the dawn of the computer era – buying expensive integrated appliances – and instead seek to optimize the data, independently of the hardware, and thus free it from the underlying infrastructure. If you could begin transforming, optimising and utilising YESTERDAY’s technology seamlessley and transparently into TODAY’s technology and be ready for TOMORROW’s new challenges, wouldn’t that be the way to go, to lessen risk and maximise opportunity?

 To do so requires a vendor neutral data optimization layer that can interface to both existing infrastructures and potential future environments such that data can reside on the appropriate medium for its access and value characteristics. Such hardware independence has to extend to all available media types – solid state, spinning disk (both magnetic and optical) and even tape – with the ability to manage data through its natural life cycle from heavy interaction and processing, warm and then cold archive and final destruction where necessary.

This data portability reduces the costs and risks associated with traditional platform migration which, no matter how well practiced and planned, is still like that Formula One wheel change – Stop…then Go – not a continuous and seamless migration that is transparent to both business and users alike. 

 Data Economics Means Downstream Efficiency Too

Data portability is only one part of the data economics landscape. Optimizing the data also requires it being stored in the most efficient way possible i.e. not only on the most cost effective hardware but also in its most dense form. Savings here then multiple as data is replicated by downstream processes – or not replicated but merely ‘indicated’ – reducing the need for costly multiple full size versions. 

The move to software-based data optimization should be prioritised before deploying any hardware related capabilities if only because this reduces the overall capacity requirement for such a move in the first place. Saving money before spending money is a key component of data economics. Such savings can then be invested in moving to newer technologies faster thus gaining competitive advantage, a great OVERTAKING manouvre. Imagine how much could be saved by moving your data from old technology to new but only requiring half, a third or even a quarter as much as before - savings not only in capital expenditure and ongoing maintenance but power, cooling and real estate too, not to mention the amount of time and resources that needed to be deployed.

 Another factor to consider is that so much of the modern data coming into businesses via hyper-data-growth is moving away from traditional block-based storage to file and object based formats. This in turn affects the whole enterprise as devices and file systems adapt to the demands of reliability, capacity and accessibility. Old and outdated concepts of spinning disk RAID invoke too much overhead in terms of capacity and have struggled with reliability as drive sizes have grown with the consequential impact on rebuild times. Furthermore the disks themselves have failed to meet performance requirements, hence the increasing move to solid state devices for high speed accessibility. 

Content Repositories Are A Key to Long-term Data Economics

Parallel file systems – long the preserve of high performance computing (HPC) environments – and data dispersal techniques – using erasure coding instead of RAID – seek to address these issues. Improvements in storage efficiency by up to 70% over traditional methods whilst improving reliability to 7 ‘nines’ and beyond are just some of the advantages on offer. Again their implementation as a software construct rather than as part of an inclusive appliance maximises their flexibility of deployment and prevents lock-in to today’s technological offerings. This form of content repository, by its very nature, is here for the long term. However, how much of today’s hardware will still be desirable or even in useable 3-4 years? One only has to look at past appliance examples such as EMC’s Centera to recognise that appliance hardware lock-in is an entrance to a cul-de-sac, no matter how bright and shiny new the concept is at launch. Deduplication appliances offered similar promises of savings in the short term but still act as dead end data silos when it comes time to replace them. That hardware independent vendor neutral software layer has to be in place to lubricate the move from today’s best option to tomorrow’s…whatever that may be.

Another advantage of the use of file systems from the HPC world is the readiness of the data to be accessed by artificial intelligence (AI) capabilities to derive further insight and thus value from the underlying numbers. Data pools became lakes and nowadays even oceans in an effort to aggregate that insight. Having all the eggs available in one virtual basket gives key advantages to the data stakeholders – the bigger the data sets the more accurate the AI vision derived, and thus the greater the business advantage gained; the higher performance the underlying storage, the faster the advantage can be realised and exploited. The future moves to Quantum Computing will just make such whole scale wholesale data warehousing even more compelling.

 Summary

Data optimization is key to maintaining and exploiting data in the kind of hyper-growth environment of today’s modern business. Optimization reduces costs today, allowing savings which can be used to invest in tomorrow’s transformative environment. Such data optimization has to be external to any given appliance since the whole point is to provide portability across platforms, both in terms of data life cycle as data value changes and also purely over time as underlying hardware matures, develops and is eventually replaced. Furthermore, migrating to a vendor neutral hardware independent optmized data ecosystem should be done as soon as possible to realise immediate savings and gain competitive advantage.

 While many vendors would claim they can do this – and many can do parts of it – a vital component is the ability to only pay once for the software capability. Only pay once for data virtualisation; don’t repay when your data needs to move from block-based to object-based storage; ensure that the cost of management of the systems is also included; that it is truly vendor neutral as well as hardware independent. If the capability is promised “only if you buy this box”, you are locking yourself in to the age old merry-go-round of buy-upgrade-replace – and the same consequent future STOP-GO wheel change you have today.

 Want to know where to licence flexible data optimization? Just ask an IBMer.

要查看或添加评论,请登录

Rick Terry的更多文章

社区洞察

其他会员也浏览了