Data Strategy – Time to re-evaluate?
It seems a long time ago that the magic 3 V’s of volume, variety and velocity was unleashed on the world to describe the evolution of data organizations were about to see. In fact it has been a whole 17 years since Doug Laney first used the term.
The story was that we needed to get ready for a new wave of data. We needed to be ready to store more data, take data that might not look like we had traditionally from operational systems (such as textual unstructured data) and handle data arriving more quickly. This was in the web era long before mobile and social took off.
Then we had the Big Data storm where all the V’s got bigger, faster and more diverse. Social Media had arrived and the use of external data to help make decisions was on the rise. I went on record at the Gartner summit in 2014 saying I thought the buzz word would die out. It turns out it is still around but we hear it much less frequently nowadays.
An emerging ecosystem of options
To deal with big data we needed new ways to store data. This led to the emergence of a new ecosystem of database options to support different needs
New model/schema databases were created with new query approaches to overcome gaps in what was available.
Cue the rapid and confusing rise of the data lake which I discussed in this blog back in 2014. It seems like most companies adopted a modified data landscape rather than an often touted “Hadoop Based Datalakeâ€. What I did not consider then was the relentless march of the NoSQL database and the impact it has had.
It’s all just data
Today I think most organizations have stopped thinking about “Big Dataâ€. Now it is just the data that they have to handle to meet different business requirements.
Importantly many of those organizations are moving the discussion on to how they get value from that most valuable of assets. It is no coincidence that Doug Laney is now more focused on Infonomics (getting value from the data you have and trying to monetize it) rather than the 3Vs.
It is great that the focus is on deriving value from data. That needs to accelerate as it is what will keep organizations alive.
On top of that could it be that organizations are missing a trick when it comes to simplifying their database landscape? I contend the answer is yes!
The complexity problem
The rapid evolution of business requirements has resulted in organizations ending up with an data landscape that has become incredibly complex. Many organizations are significantly overspending on managing that complex bloated data landscape. This complexity is a massive problem in the context of the impending GDPR regulation.
Organizations have many tables and many databases. On top of that they often have a huge variety of databases including tabular relational databases, columnar databases, NoSQL databases and the list just goes on. Organizations have reached this point because they had to meet their business needs. The databases they had were not able to support what they needed to do when they needed to do it.
The question is: Is this still the case?
The Simplification Mantra
I believe it is time for an approach that drives towards a refresh. It is time to take stock. Think simplification of the data landscape while continuing to meet the business needs today and of the future.
That refresh and simplification will help with costs and manageability and it will help with GDPR. By reducing complexity at source organizations will be better set to use data to create value rather than passing on chaos and complexity to value creators!
The Step Change
The progress in databases has been almost as relentless as the progress in other areas. I am going to highlight this with Microsoft examples but this is not just constrained to Microsoft.
Just think about things like Microsoft CosmosDB which can obliterate the need for different NoSQL databases and give you powerful SLAs you can run your business on. It provides multiple models and multiple query styles in one go without having to re-write applications. Imagine if you can go from 4 different NoSQL databases to 1. Would that make a difference in complexity? would it help you with GDPR?
Think about the fact that SQL Server can now run on Linux. Would that make you consider if an open source database is really better than an enterprise grade best in class equivalent you can now use when security and reliability around data is going to underpin everything you do?
Look at the fact Graph processing is available in SQL Server and that machine learning capabilities are now pervasive in databases with SQL Server supporting Python and R. Would that change the need to create separate data marts for analytics processing reducing complexity and data sprawl?
New deployment options
Finally lets look at the new deployment options.
- Flexible agreements that let you slowly move to the cloud.
- Possibilities to move everything you had before on premise into the cloud unchecked which lets you reduce the management overhead of hardware and having to deal with all that CapEx without changing any of your choices.
- The possibility to use managed services in the cloud with powerful SLAs to reduce administration overhead while enabling new modes of data storage to support emerging business needs
- The possibility to build Hybrid solutions that span into the cloud as needed
- The capability to stand up what you want when you want it and have all that handled with super clear SLAs.
The modern data estate is unleashed. It spans all deployment modes, offers almost every type of database you might need and helps you find the right ones to meet your business needs. Options abound for simplification, consolidation, modernization and agility within your data landscape all without compromising on meeting your business needs.
Moving forwards
The forwards momentum in database capability and their deployment options is staggering. Many organizations are not on top of that. Previous decisions, even from as little as 12-18 months ago, can now be revisited to see if your data landscape is running as efficiently as possible.
It is a known fact that progressive organizations, some already because of GDPR, are busy documenting their data assets. In most cases better than ever before. Most of them are focused on what data is where though and how to secure it and ensure it is used appropriately.
Many are not looking at which database it is being stored and if migration and/or consolidation could make life much easier. Be sure to think about your data landscape and consider how it can evolve.
Here are some questions:
- Have you recently looked at where you are storing your data and do you understand why you have it there? Have you evaluated if there a better option today?
- Do you know how much it is costing you to manage and maintain your data estate and could reduced complexity reduce that? If lowering IT costs is on your radar this is a sure fire way to find ways to do that.
- Have you considered if your GDPR compliance would be easier with a less complex environment to manage? Is database consolidation an option you considered on your GDPR journey? If not why not?
- When did you last evaluate which databases need to be on-premise, which can be deployed in a hybrid mode and which should be able to be totally moved to the cloud? If not recently you may be constraining your potential based on old options and adding additional costs you do not need.
In Conclusion
A modern data estate will provide options to meet you where you need it to. As you consider your data landscape moving forwards you might want to think about if you are missing a trick by not thinking big picture and looking for vendors who can, perhaps together with partners, cover the entire data estate and all that entails.
What do you think?
Note that while I work for Microsoft the opinions expressed on this page are 100% my own and do not necessarily reflect those of my employers.
Director Solution Engineer and Data Architect. Teradata Brand Amplifier.
7 å¹´Good read and thanks for sharing. I agree. I also believe that standardizing on just a view ways to store and deliver your data will ultimately reduce cost and reduce complexity, which in turn also reduces cost. Of course business rules can be very complex, but while standardizing you have a chance to revise them and redesign the application of these business rules in a more understandable way. Maybe even introduce a business rule engine.
Senior data architect
7 å¹´The usual line of thinking by reducing complexity is by reducing the amount of databases your data is stored in. The complexity in data or knowing where your data is (GDPR is just a trigger to take stock once again) is not in the amount of databases you need to manage, it is in the complexity of the data structures and the (lack of a) data model. If you would migrate all the data in one can-do-it-all-database (the sales argument of MarkLogic), you would not have reduced the complexity in managing the data and the access to it one single bit. It is the same complexity. Or rather, more complex because of a total lack of segregation of duties and concerns of the data and the resulting forest and the trees problem you thus create. The trick is in the management of your data, the lineage, the access control, the data model, the semantic glossary of all data. A field where Microsoft is seriously lacking in capabilities (to name a company). Having to manage less databases do something for your run costs (licenses, compute power, man-hours).