登录查看更多内容

Data Lake vs Data Warehouse Architecture [5 out of 10]

Mahmoud Yassin

Senior Data Manager | Customer Data, Insights and AI | Helping Booking.com to modernize their data architecture on top of public cloud infrastructure

发布日期: 2023年2月5日

This is a?series?of articles to talk about the importance of having a solid data architecture in your business. The series will include the below articles:

1- Introduction to Data Architecture 

2- OLAP vs OLTP 

3- Data Warehouse Architecture deep dive 

4- Data Lake Architecture deep dive 

5- Data Lake vs Data Warehouse Architecture [current article]

6- Cloud computing effect on Data Architecture

7- Rise of Data Mesh Architecture

8- Data Mesh Vs the rest

9- Rise of DeltaLake Architecture

10- Which Data Architecture shall I choose?

A?data warehouse?stores current and historical data from one or more systems in a predefined and fixed schema, which allows business analysts and data scientists to easily analyse the data.

A?data lake?stores current and historical data from one or more systems in its raw form, which allows business analysts and data scientists to easily analyse the data.

Common patters between data lake and data warehouse:

t's important to note that while both data warehouses and data lakes can coexist in an organisations data pipeline, they serve different purposes and are used for different tasks.

Both the data architectures have a lot in common as both of them expecting that the data will be stored centrally in one place.

Data warehouses are optimised for fast querying and analysis of structured data, while data lakes are designed to store vast amounts of raw, unstructured data at low cost and allow for more flexible and cost-effective data processing.

Companies can use both solutions to meet their specific data needs, depending on the requirements of the specific use case.

Main differences between data lake and data warehouse:

1- Purpose

The data stored in a warehouse is processed for a specific application, such as log management, sales reporting, or security analysis. On the other hand, raw data stored in a data lake is kept without a defined purpose, but is preserved in case it becomes useful in the future.

2- Schema on read vs schema on write:

Schema-on-write is the central concept behind data warehouse architecture. It involves creating a schema or structure for the data before storing it in the database. This requires the use of Structured Query Language (SQL) and leads to a highly structured Relational Database Management System (RDBMS). One of the most time-consuming tasks in an RDBMS is performing Extract, Transform, Load (ETL) operations.

In contrast, schema-on-read is the central concept behind data lake architecture. It means creating a schema only when the data is being read, allowing unstructured data to be stored in the database. This makes it easier to incorporate new data sources quickly, as it is not necessary to define the schema beforehand.

The shift towards schema-on-read is driven by the growing amount of unstructured data and the overhead of ETL in an RDBMS. Analysts may not know the insights they can gain from new data sources, making the process of bringing in new data sources time-consuming.

3- Data preparation:

In data warehouse architecture, the data is cleaned, organised, processed, and transformed before being stored in the final layer of the warehouse. In contrast, data transformation is not necessary in data lake architecture, as the data is stored in its original form. The data will only be transformed when it is being read from the curated or consumption layers.

4- Supported data types:

Data warehouses primarily support structured data, as the data is expected to be stored in a structured manner. This creates difficulties when dealing with semi-structured or unstructured data.

领英推荐

Significance of Data Architecture: Uncovering best…

Techmango Technology Services Private Limited 3 个月前

Data Vault – A Modern Architecture for Your Enterprises

Lyftrondata 6 个月前

Azure Data Factory: A Beginner's Guide for Data…

vThink Global Technologies Private Limited 2 个月前

In contrast, data lakes allow the storage of data in its original format, regardless of the data type, making it easier to store both structured and unstructured data.

5- Users

Another difference between data lakes and warehouses is the type of user. Data warehouses can often be used by data analysts or business analysts with an understanding of the processed data.

Data lakes, on the other hand, are more complex due to the large amount of unstructured data, and require the expertise of a data scientist or data engineer to interpret and structure the data.

6- Data size & security

Data lakes can store massive amounts of information, with each unit holding 1,000 terabytes. Due to their large size and the fact that they store all types of data, they are less secure than smaller, structured data warehouses.

Furthermore, data warehouse technology is well established and its security measures are mature, while big data security is still evolving. However, it is expected that security measures for data lakes will improve over time.

7- Cost:

The cost of storage in a data lake is usually less expensive compared to a data warehouse. In addition, data lakes are easier to manage and result in lower operational costs.

On the other hand, data warehouses tend to be more expensive and require more time to manage, leading to higher operational costs.

8- Data storage:

A data lake contains all the raw and unstructured data of an organisation and can store it indefinitely for immediate or future use.

A data warehouse, on the other hand, holds structured data that has been processed and cleaned, ready for analysis based on predefined business requirements..

9- Processing time:

Data lakes allow users to access data before it has been transformed, cleaned, and structured, enabling them to reach their results more quickly compared to traditional data warehouses.

Data warehouses, on the other hand, provide insights into predefined questions for predefined data types, and making changes to the data warehouse takes more time.

10- Data time travel:

Data lakes can store all data, including data that is currently in use as well as data that might be useful in the future. The data is kept indefinitely, allowing for historical analysis.

In contrast, the development process of a data warehouse involves significant time spent on analysing various data sources. Once the data is ingested into the warehouse, it is not possible to recreate it in the event of an error, as the original data is lost during the ETL process.

Cloud computing has indeed transformed the way data is stored and processed, offering scalability, accessibility and cost-effectiveness to organisations. I look forward to write about this on my next article on the role of cloud computing in data architecture. I am sure it will be informative and engaging

Feel free to comment and let's engage to exchange opinions and benefit from each-others experiences.?

Join a growing community of 1500+ Data Enthusiasts by subscribing to my ????????????????????:?Data Architecture History

#data?#writing?#architecture?#cloudcomputing?#warehouse?#data?#dataarchitecture?#datawarehouse?#datalake?#datalakehouse?#datamesh?#datafabric?#dataanalytics?#ai?#bigdata?#knowledgesharing?#article

Data Architecture History

2,120 位关注者

(on sabbatical) Scott Hirleman (back mid next year maybe but prob not)

Data Mesh Radio Host - Helping People Understand and Implement Data Mesh Since 2020 ??

2 年

Mahmoud Yassin have you gotten in touch with Melissa Logan and team yet about the Data Mesh Learning blog and stuff?

1 次回应

Pieter Pauwels

Account executive @ The Data Intelligence Company

2 年

Alexandre t'Kint Stijn (Stan) Christiaens

1 次回应

Hicham Zmarrou, Ph.D

Lead Architect for Data & AI at Microsoft

2 年

Mahmoud, next article, combine both!

2 次回应

查看更多评论

要查看或添加评论，请登录

Mahmoud Yassin的更多文章

Rise of Data Mesh Architecture [7 out of 10]

2023年3月4日

Rise of Data Mesh Architecture [7 out of 10]

This is a series of articles to talk about the importance of having a solid data architecture in your business. The…

1 条评论
Cloud computing effect on Data Architecture [6 out of 10]

2023年2月19日

Cloud computing effect on Data Architecture [6 out of 10]

This is a series of articles to talk about the importance of having a solid data architecture in your business. The…

1 条评论
Data Lake Architecture [4 out of 10]

2023年1月18日

Data Lake Architecture [4 out of 10]

This is a series of articles to talk about the importance of having a solid data architecture in your business. The…

2 条评论
{Classical} Data Warehouse Architecture [3 out of 10]

2023年1月9日

{Classical} Data Warehouse Architecture [3 out of 10]

This is a series of articles to talk about the importance of having a solid data architecture in your business. The…

1 条评论
OnLine Analytical Processing (OLAP) vs OnLine Transaction Processing (OLTP) [2 out of 10]

2022年12月14日

OnLine Analytical Processing (OLAP) vs OnLine Transaction Processing (OLTP) [2 out of 10]

This is a series of articles to talk about the importance of having a solid data architecture in your business. The…

6 条评论
Introduction towards Data Architecture? What, Why, Who? [1 out of 10]

2022年12月13日

Introduction towards Data Architecture? What, Why, Who? [1 out of 10]

This is a series of articles to talk about the importance of having a solid data architecture in your business. The…

4 条评论
Introduction towards Data Architecture? What, Why, Who? [1 out of 10]

2022年12月6日

Introduction towards Data Architecture? What, Why, Who? [1 out of 10]

This is a series of articles to talk about the importance of having a solid data architecture in your business. The…

8 条评论
Hadoop In Action

2016年12月11日

Hadoop In Action

On 6-12-2016 I gave a practice session about Hadoop and big data. Session Title: Hadoop In Action Session description…
The Hadoop Ecosystem - The Tipping Point of Big Data

2016年11月25日

The Hadoop Ecosystem - The Tipping Point of Big Data

Introduction to Big data A deep introduction to the big data topic along with real adviceS of how to start a career in…

3 条评论

See all articles

Data Lake vs Data Warehouse Architecture [5 out of 10]

Mahmoud Yassin

Senior Data Manager | Customer Data, Insights and AI | Helping Booking.com to modernize their data architecture on top of public cloud infrastructure

Common patters between data lake and data warehouse:

Main differences between data lake and data warehouse:

1- Purpose

2- Schema on read vs schema on write:

3- Data preparation:

4- Supported data types:

领英推荐

5- Users

6- Data size & security

7- Cost:

8- Data storage:

9- Processing time:

10- Data time travel:

Data Architecture History

2,120 位关注者

Mahmoud Yassin的更多文章

社区洞察

其他会员也浏览了

What Are Data-Driven Projects Or Business Architectures?

Navigating the Data Landscape: Evolution from Traditional Warehousing to Modern Lakehouse Architecture

Data Management

Why Data Solutions need to be architected well?

Choosing the Right Data Architecture

Data Architecture Patterns: Choosing the Right Approach

Data as a Product - Data architecture the management blueprint

Data warehouse

Understanding Kimball and Inmon Data Warehouse Architectures

Data Warehouse vs Data Lake vs Data Lakehouse: What's Best for Your Organization?

Common patters between data lake and data warehouse:

Main differences between data lake and data warehouse:

1- Purpose

2- Schema on read vs schema on write:

3- Data preparation:

4- Supported data types:

领英推荐

5- Users

6- Data size & security

7- Cost:

8- Data storage:

9- Processing time:

10- Data time travel:

Data Architecture History

2,120 位关注者

Mahmoud Yassin的更多文章

Rise of Data Mesh Architecture [7 out of 10]

Cloud computing effect on Data Architecture [6 out of 10]

Data Lake Architecture [4 out of 10]

{Classical} Data Warehouse Architecture [3 out of 10]

OnLine Analytical Processing (OLAP) vs OnLine Transaction Processing (OLTP) [2 out of 10]

Introduction towards Data Architecture? What, Why, Who? [1 out of 10]

Introduction towards Data Architecture? What, Why, Who? [1 out of 10]

Hadoop In Action

The Hadoop Ecosystem - The Tipping Point of Big Data

社区洞察

其他会员也浏览了

What Are Data-Driven Projects Or Business Architectures?

Navigating the Data Landscape: Evolution from Traditional Warehousing to Modern Lakehouse Architecture

Data Management

Why Data Solutions need to be architected well?

Choosing the Right Data Architecture

Data Architecture Patterns: Choosing the Right Approach

Data as a Product - Data architecture the management blueprint

Data warehouse

Understanding Kimball and Inmon Data Warehouse Architectures

Data Warehouse vs Data Lake vs Data Lakehouse: What's Best for Your Organization?