The (Modern) Big Data Platform
Disclaimer: The views expressed here are mine alone and do not necessarily reflect the view of my current, former, or future employers.
Although technology constantly changes, we largely agree that data volumes continue to grow, becoming one of an organization's most valuable assets.? I previously stated, "Organizations that effectively utilize their data will be the ones to have a competitive advantage moving to the future of the big data era." This statement continues to hold true, but recent capabilities offer more effective ways to obtain value from data. In the previous article, I argued that the Enterprise Data Warehouse should be augmented with a Data Lake based upon distributed technology, such as Hadoop, to support expanding big data use cases for data science, Machine Learning (ML), and Artificial Intelligence (AI). This post aims to provide updates on the ever-changing landscape of the Modern Big Data Platform.?
Let's quickly review the need for a Data Warehouse complemented with a Data Lake architecture. With this design, the Data Warehouse continues to be the workhorse for structured reporting and analysis, supporting complex structured data, and handling some unstructured data. There were a few gaps with this pattern:?
Data Lakes offer a way to support the three areas listed above without losing the capabilities of their legacy environment.?This approach yields a low barrier of entry and initial cost to add big data and analytical workloads to the enterprise. Data scientists are to become empowered with varied data for data discovery, and AI/ML use cases at greater speed and scale than with a Data Warehouse alone.??
Example Legacy Big Data Reference Architecture
Although this legacy approach provides quick wins to support expanding use cases organizations must make strategic choices with data, specifically tradeoffs for data management, sizing, and accessibility. Below I will describe the challenges with these tradeoffs and provide an updated view of Big Data for 2022.?
领英推荐
Fast forward to 2022, I find this architecture adds technical debt, becoming a challenge to manage and scale, and not delivering on the promised long-term benefits.?Enter the Cloud. It unlocks value by not having to choose between Data Warehouse or Lake. I ended the previous article with, "as we continue to obtain more and more data, I believe the Data Warehouse and Data Lake will become more and more blurred." Today the Data Warehouse and Data Lake have completely converged in the form of the Data Cloud. Cloud Data Platforms have been around for many years, but most have been lifted and shifted, not natively architected for the cloud, reducing their effectiveness.
Data Cloud Architecture. Source: https://www.snowflake.com/blog/beyond-modern-data-architecture/
The Data Cloud provides a single copy of data under one platform, removing data silos. Data can take many forms, be it structured, semi-structured, or unstructured, and provide users with greater value to lower total cost of ownership (TCO) than with the overhead of managing and maintaining multiple platforms. In addition, the Data Cloud provides an always-connected, continuously updated platform with no upgrades or patching required.??
The architecture allows you to infinitely and separately scale your data and compute. You don't need to worry about degradation or slowed performance over time. Need more horsepower, no problem, click a button, and away you go within seconds. No longer want to pay for the extra compute or data for a throw-away analysis; stop running the compute job and delete the data. It's that simple.??
The Data Cloud provides native ways for organizations to clone and share data within an environment without moving actual data. Cloning provides developers, data scientists, and other users almost instant access to data, whatever form it takes, in a secured and governed manner. In Deloitte's 2022 Tech Trends, the authors state, "During the next 18 to 24 months, we expect to see more organizations explore opportunities to create seamless, secure data-sharing capabilities that can help them monetize their own information assets and accomplish business goals using other people's data." The Data Cloud also provides the ability to securely share data assets within your organization, outside your organization, or quickly obtain new data to enhance your own. Imagine needing to leverage COIVD-19 data and use it for demand planning - then sharing the results with your management team and suppliers - all with a press of a button. ?
I previously suggested that the way to modernize your data architecture was by adding a Data Lake to your existing architecture, expanding capabilities, and supporting new and varied data and use cases. Users had to choose where to put their data, creating additional data silos and bottlenecks. This is simply no longer the case. The modern big data platform is here. And because the cloud enables automatic updates and newly released features continuously, the investment to shift to the Data Cloud will pay off. Not just for today but throughout the ever-evolving data landscape, providing organizations with a future-proof platform and delivering a competitive advantage for today and years to come.?