登录查看更多内容

Modern Data Stack- Part 1

Sankha Mitra

Data Architecture | Data Warehousing | Business Intelligence | Cloud Architecture and Solutions | Program Management | Practice Management

发布日期: 2024年3月15日

What is modern data stack?

?The birth of cloud data warehouses with their massively parallel processing (MPP)capabilities and first-class SQL support has made processing large volumes of data faster and cheaper. This has also led to the development of many cloud-native data tools that are easy to integrate, scalable and economical. These tools and technologies are collectively referred to as the Modern Data Stack (MDS).

?Modern data stack and modern data platform – Differences

?A data platform is the set of components through which dataflows while a data stack is the set of tools that serve these components.

?Key characteristics

1.???? Cloud-first approach

2.???? Built around cloud data warehouse/lake

3.???? Focus on solving one problem at a time

4.???? Offered as SaaS or open-core

5.???? Low-entry barrier

6.???? Actively supported by communities

?1.???? Cloud-first - Modern public cloud vendors have enabled MDS tools to become highly elastic and scalable. This makes it easy for organizations to integrate them into their existing cloud infrastructure.

?2.???? Built around cloud data warehouse/lake- Modern data stack tools recognize that a central cloud data warehouse/lake is what fuels robust data analytics. So they are designed to integrate seamlessly with all the prominent cloud data warehouses (like Redshift, Big Query, Snowflake, Databricks etc.) and take full advantage of their features.

?3.???? Focus on solving one specific problem at a time- The modern data stack is a maze of tools connected by the different stages of the data pipeline. Each tool focuses on one specific aspect of data processing/management. This enables modern data stack tools to fit into a variety of architectures and plugs into any existing stack with few or no changes.

?4.???? Offered as SaaS or open-core- Modern data stack tools are mostly offered as SaaS (Software as a Service). In some cases, the core components are open-source and come with paid add-on features like end-to-end hosting and professional support.

?5.???? Has low entry barrier - Modern data stack tools are packaged in easy pay-as-you-go and usage-based pricing models. Data practitioners can explore new tools and their features and utility before making big commitments. This saves money and time. Also, MDS tools are designed to be low-code or even no-code. Tool setup can be completed in a few hours and does not require big tech expertise or time investments.

Data & Analytics 7 个月前

Snowflake vs. Databricks: Unraveling the Ideal Data…

FindErnest 5 个月前

Tracing the Transformative Journey of Social Media…

Ayoob Ibrahim 9 个月前

?6.???? Actively supported by communities- Modern data stack solution providers invest considerable time and effort in community building

What was the need for a modern data stack?

3 major factors

The emergence of Hadoop and the public cloud

Prior to Hadoop, it was only possible to vertically scale the infrastructure. So data processing demanded a large upfront investment. With the emergence of Hadoop, it was possible to horizontally scale storage and compute on commoditized hardware. But even after that, the user experience was clunky (map-reduce) and only large organizations could invest in the special skills required to make it work well.

But then as public cloud platforms became inexpensive and accessible, even smaller companies could afford storage and compute on the cloud.

?The launching of Amazon’s Redshift - Meanwhile, the microservices architecture had popularized NoSQL and non-relational databases. When loaded into a Hadoop cluster for analytics, this non-relational data was hard to process using SQL. This forced data teams to use other programming languages like Java, Scala, and Python to process data. Organizations came to depend on expensive engineering resources and highly specialized skills.

Data democracy took a severe hit. Amazon’s Redshift changed all that

Launched in 2012, Redshift was launched in 2012 making it the very first cloud data warehouse. It allowed large volumes of data to be stored on horizontally scalable infrastructure, and also made it possible to query the data using plain SQL.

?As on date Amazon’s Redshift has decoupled processing capabilities and storage

?A growing need for better tooling- ?In the following years, data warehouse solution providers were able to further improve the architecture, separate storage and compute and offer better price points and scalability. But transforming, modelling, cleaning, and converting data into actionable insights remained cumbersome and error-prone.

Fast-growing businesses became unhappy with what they were getting in return for their large infrastructure investments. Their data had grown in volume, variety and complexity, but the ecosystem still did not have the tools that could manage it well.

Privacy too was becoming a serious matter and governments across the world wanted to protect their citizens from overly digitized information systems. This led to stringent regulatory frameworks such as the EU’s GDPR and California’s CCPA.

As the basic building blocks of the analytical data platform matured and stabilized, better data management and observability became super important. The ground was set for the development of a better set of tools that could address these challenges. Investors and entrepreneurs became interested and the modern data stack became the focus of attention and innovation.

Modern Data Stack- Part 1

Sankha Mitra

Data Architecture | Data Warehousing | Business Intelligence | Cloud Architecture and Solutions | Program Management | Practice Management

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Solutions for Common BigQuery Concerns

Fivetran's Managed Data Lake: A Leap Forward in Data Management

Google BigQuery vs Amazon Redshift: Learn Key Differences

Snowflake vs Redshift vs Google BigQuery

The Definitive Guide to Data Lakes on AWS

Unveiling the Power of Snowflake Data Cloud Ecosystem

Why Snowflake?

best practices for data warehousing with Azure real world scenario

Unpacking Snowflake Architecture: Revolutionizing Data Management and Analysis

A Data-Driven Business Culture with Snowflake Cloud Data Warehouse

领英推荐

Slowly Changing Dimensions

2024年4月1日

Modern Data Stack- Part 2

2024年3月20日

Apache Iceberg and Data Lake

2024年3月12日

What is Massively Parallel Processing (MPP)?

2023年8月2日