Journey From Traditional To Modern Datawarehouse
Introduction
Sir Tim-Berners Lee, famously known for inventing the world wide web once said-
CDOs and CEOs know, that being able to connect the dots within the organization is necessary for the growth of their business. C level executives who can drive this organizational change successfully are seen as leaders and trendsetters within the organization and outside alike. Hence their inclination towards modernization. However we still do not see the modernization of enterprise platforms at the rate at which they are ought to happen.
In this blog...
In this blog, evolution of traditional data warehouse to modern data warehouse is discussed. There is a section about the challenges faced by the organizations to move from a traditional monolithic warehouse to modern data lake house and how the organization can start to solve it.
It all began with Moore's prediction
In 1965, Gordon Moore, co-founder of Intel, while speaking to a journalist of Electronics Magazine, stated that the number of transistors per square inch on an integrated circuit will double every year and this trend will continue in the foreseeable future. Half a century later, researchers say, this statement still stands true. Moore’s law since then has driven many industry trends. It has taken the technology industry from the era of pocket calculator to smartphones.
Evolution of Traditional Datawarehouse
Back in the 90s, Inmon’s Datawarehouse(DWH), was emerging. It was seen as a silver bullet - a monolithic system that can contain all the right information necessary for an entire business's decision-making. This was largely true and was also possible because, back then, the data sizes were smaller, the variety of source data was limited and the frequency of incoming data was restricted to once or twice a day.
We have seen that in the last 20 years, as Moore predicted, storage and computing technology is continuing to become cheaper and faster. With the increased speed of data creation, variety & volumes of data coming from SaaS applications, web forms, IoT devices, sizes of data generated is simply enormous. Most of the data generated is noise and simply, storing this huge data was a liability to companies for a long time, until businesses realized that they can harness profitable insights hidden in this huge data.
Big-Data & Modern Data Warehouse
Hence an urgent need arose for an architecture or methodology backed by technology, to convert this liability (data) that was getting accumulated at a rapid pace, into an asset. Industry Gurus aspired that this new paradigm would help gain insights into actionable information that was never possible with the traditional DWH. This is when we started hearing the evolving buzzwords like Big data, Hadoop, open-source systems, data lake, data swamp and now finally DATA LAKEHOUSE(which is how the data experts like to call the Modern DWH) that can cater to variety, velocity and the veracity of the data that comes in.
The Transition Phases
Now, the organizations acknowledge that there is a need for change and they have envision the end state. It now comes down to managing the transition from traditional to modern DWH. The transition to modernization from a traditional decision support system(DSS) requires the organization to have a growth mindset with a leadership that is open to change and a culture that is not afraid to adapt and learn.
Then the modernization happens in two parts –
Transition phase 1: Moving from centralized compute and storage infrastructure(on-premise) to leveraging on distributed computing(cloud computing) infrastructure. These two articles - article-1 & article-2, talk about transition phase 1 i.e. about companies’ journey of modernizing their on-prem data platform to cloud.
Transition phase 2: is about moving from traditional DWH/DSS to Modern DWH/Data Lakehouse. Now let us go a bit deeper into this phase.
Challenges to Datawarehouse Modernization
More often than not, it is the organization's data and platform management team(s) that shows reluctance towards modernization. Their reluctance is understandable and it is not due to lack of reasons. Before we move away from the culture of monolithic/traditional DWH architectures, it is necessary to understand what worked well in the past and what is the fear that these teams face with the prospect of moving to modern DWH.
Comparison of traditional and modern data warehouse
From the above table, it is evident that although Modern DWH fares well in most aspects,?traditional DWH wins when it comes to ‘data trust’.
领英推荐
Trusted Data
In the traditional DWH, data trust is developed because 'under the hood' information about the final data outcome is available on request. It is also because the data processing is a standard pre-defined set of processes with data quality, completeness, accuracy checks that is ensured.
Typically, in traditional data governance processes, depending on the audience, level and details of transparency was decided and shared. Ideally, the data itself should never be a secret. Calculations and number crunching and the assumptions made along the way should never be a secret either.
All the above, leads to data trust.
However, it is not easy to conduct this in modern DWH.?The below graph shows that human adaptability to advancement of technology is way lower than before. That means, it is very difficult to be able to effectively govern and administer the new data sources at the rate at which they are coming into the modern datawarehouse.
Million dollar question
Upon acknowledging the challenge that is faced around data trust, the next question that arises is - How to we stop erosion of data trust?
What adaptations of tools, technologies, processes and standards are necessary to evolve the traditional DWH towards modernization? How to make sure that the data under the hood is understood and cataloged for easy user access? How to democratize the mapped or cataloged information(about code/process/data) such that it is available to required users all the time?
Conclusion
Big data is a liability until the organization learns to harness the hidden and valuable data insights. It is the need of the hour and sooner the organizations realize and move towards their modernization, the better for their growth. While transitioning to modernize their traditional warehouse , they need to ensure the data trust is not eroded and is rather enhanced. This is a challenge that every data and platform management technology team understands and are relentlessly working on finding the right solution for thier stakeholder. A modern data governance tool combined with well-defined data governance policies and standards forms a key part of the solution for way forward for a smooth and successful journey to modern Datawarehouse.
This article is my attempt to translate my experience gained while working with varied customers on thier data platforms, seeing thier problems first hand and helping them to solve it. If you think this article was helpful, please give it a thumbs up and share your views in the comment section, thank you!
#Purview #AzureSynapseAnalytics #ModernDataWarehouse #DataLakehouse
For further reading and references
Thanks for sharing Suma. Learnt new concept.
Senior Business Analyst
2 年Nicely Articulated!!
Senior Software Consultant at Standard Chartered Bank Singapore
2 年Nice Article Suma, Thank You for sharing.
Data and AI Architecture & Platform Leadership
3 年interesting perspective. well done. arguably data trust is not related to traditional or modern architecture largely - it is a governance issue that was and will be there if the business data owners and the data engineering teams do not work together - data trust is a more cultural issue in an organization- of course modern tools can ease the process to bridge the gaps .modern data platform tools are helping to a certain extent in the journey..... I think enterprises should as WHY do they want to move to modern platform architectures - motivations and quantify them to harness ROI. NB >> Databricks + Delta lakehouse -- > getting best of both a lot cheaper scalable and flexible than how much we would have paid for MPP years back.
Associate Director (Analytics and Artificial Intelligence) at Cognizant Singapore
3 年Nice read. Thanks Suma.