登录查看更多内容

Choosing between Lakehouse, Data Warehouse and Real-time Analytics in Microsoft Fabric.

Jacob R?nnow Jensen

Head of Data Platform @ AP Pension | Leadership | Data Delivery | Microsoft Data Platform | Business Intelligence | Digital Transformation | IT-Strategy | Data Warehousing | Data Architecture | Collaboration

发布日期: 2023年6月19日

It is very easy to ingest data into Microsoft Fabric and start working with them - whether it being in a Lakehouse, Data Warehouse, KQL database or Datamart.

Although there is a lot of overlap between these offerings, there are also some subtle differences, that I feel are important to explore - and what better time than now, where Fabric is in Public Preview and a 60 day free trial is available.

In the article Lakehouse end-to-end scenario: overview and architecture it is described how an entire Data Delivery Platform - from the extraction of raw data through dimensional modelling - can be done using only Lakehouses. However, it is also noted, that “with the flexibility offered by Fabric, you can implement either Lakehouse or Data Warehouse architectures or combine these two together to get the best of both.”

3 scenarios for choosing between Data Warehouse, Lakehouse and Datamart are described in the article Fabric decision guide - lakehouse or data warehouse. The key differentiator here (as in the rest of the mainstream narrative on this subject) seems to be the skill set of the existing developer team - with a honorary mention of the possibility for multi-table transactions with Data Warehouse.

The newly introduced Datamarts in Power BI still exists under the Fabric license, and to muddy the waters even further, a fourth path, for ingesting and transforming data in your Data Delivery Platform, is Real-Time Analytics in Microsoft Fabric which through eventstreams and a KQL database offers some of the same capabilities as the Lakehouse and Warehouse.?(Below, I have lined up the suggested architectures from the Lakehouse and Real-time Analytics documentation respectively which I feel illustrates this point).

I feel, that Microsoft Fabric, with the unified experience, the open format on OneLake, and the rise of Co-Pilots and LLMs, offers us the ability to change the way we work with data and analysis - and that the optimal combination for ingesting, modelling and serving data in Fabric should depend on the specific use case, performance, price and governance model - not on current skills or what we did in the past.

For my own use in the Public Preview, I have compiled a comparison of capabilities between the Lakehouse, Data Warehouse, Real-time Analytic and PowerBI Datamart ... and I have highlighted the points, that sets them apart.

When working with data on an ad-hoc or exploratory basis, it can be a good idea to match the technology to the preferences of the persona working with it. Together with the Premium Per User license, this can make the case for the use of the PowerBI Datamart for Citizen Developers ... as the case can be made for offering the Data Lake for Data Scientists and the Data Warehouse for Data Analysts.

For Data Engineers, my suggestion right now is to start experimenting with the different paths and technologies and monitor the performance and limitations ... and to that end, I hope that the table above can give you an idea of where to start.

A few additional comments:

领英推荐

The Data Analytics Technology Updates You Need to Know…

Analytics8 | Data & Analytics Consultancy 1 年前

Data Lakes: Enterprise Treasure Trove !

Dr Rama Iyer, (Ph.D) 4 年前

Tutorial: Fabric for Analysts - Data Storage & Serving

Jon Stjernegaard V?ge 1 年前

Currently the performance in the Lakehouse seems better than in the Warehouse, but this is to be expected since performance, concurrency and scale has not been the focus of the current release. ?

At the time of GA, I would also expect the Data Warehouse artifact to also support DirectLake, but for now, this can only be tested towards a Lakehouse.?

The support for multi-table transactions in the Data Warehouse Artifact could be a key factor when choosing how to implement your gold layer if you want to ensure, that related tables (i.e. in a star schema) are updated together or not updated at all. I also feel that one should take the warning from Better together - the lakehouse and warehouse seriously: "A SQL Endpoint is not scoped to data analytics in just the Fabric Lakehouse."

The Warehouse and SQL Endpoint share the same underlying processing architecture regardless of the artifact, but the Data Warehouse is the only one that supports "ingestion separation" where ingestion jobs can run on dedicated nodes that are optimized for ETL and do not compete with other queries or applications for resources.

OneLake shortcuts?can be used to create read-only replicas of tables in other workspaces to distribute load across multiple sql engines creating an isolation boundary.

As for security, data access has its own models depending on the engine, you are accessing. Before going ahead and implementing shortcuts, you should know, that when accessing them through Power BI Datasets or T-SQL,?the calling user’s identity is not passed through to the shortcut target.?The calling item owner’s identity is passed instead, delegating access to the calling user. Additionally, there is no automated process for cleaning up cascading shortcuts.

I feel that there is still a lot to be done to prevent access management and security from becoming total chaos, and I would love to be able to handle all data sharing, shortcuts and access control centrally in Purview - not just the ones mentioned in Administration, Security and Governance in Microsoft Fabric

BTW: if you agree, you can give my suggestion a vote or a comment in the ideas section of the Fabric community site.

Anshul Sharma

Product @ Microsoft

1 年

Hi Jacob - great article, I want to highlight that KQL databases now support querying shortcuts in delta format. There are other factors such as streaming ingestion latency, time series functionalities that might affect the decision. Please check this updated guidance - https://learn.microsoft.com/en-us/fabric/get-started/decision-guide-warehouse-lakehouse

2 次回应

查看更多评论

要查看或添加评论，请登录

Jacob R?nnow Jensen的更多文章

Going live with the AP Enterprise Fabric

2025年3月23日

Going live with the AP Enterprise Fabric

Some of the most complex projects often start with a simple idea. Noel Yuhanna from Forrester came up with the term Big…

3 条评论
Translating legacy code with GenAI

2025年2月11日

Translating legacy code with GenAI

Introduction Analytics has been a commercial battleground for decades in the financial sector, and the need for data…

18 条评论
SQL Databases in Fabric?

2025年1月23日

SQL Databases in Fabric?

Introduction Much like a thermos that can keep both hot liquids hot and cold liquids cold (but not at the same time)…

6 条评论
Fabric Workspace design for automation and Data Delivery in AP Pension – Part 2

2024年11月23日

Fabric Workspace design for automation and Data Delivery in AP Pension – Part 2

In Part 1 of this article, I discussed the approach, we in AP Pension have taken to workspace design for data delivery…

19 条评论
Fabric Workspace design for automation and data delivery in AP Pension – Part 1

2024年11月17日

Fabric Workspace design for automation and data delivery in AP Pension – Part 1

In AP Pension, we have been working some time on building a modern data platform, AP Data, with Microsoft Fabric and…

4 条评论
Renaming Fabric SQL-endpoints in SSMS

2024年8月10日

Renaming Fabric SQL-endpoints in SSMS

One of the many qualities of OneLake in Microsoft Fabric is, that it automatically builds SQL Analytics Endpoints for…

1 条评论
Being responsible for data in OneLake

2024年4月22日

Being responsible for data in OneLake

“OneLake is the OneDrive for your data!”. As a user, I understand what it means, and I have seen (and done) countless…

2 条评论
Introducing Mirroring in Fabric

2024年3月27日

Introducing Mirroring in Fabric

At the first anual Microsoft Fabric Community Conference, Microsoft announced that Mirroring is now in Public Preview…

6 条评论
Learning from the future as is emerges

2024年1月29日

Learning from the future as is emerges

A structured and democratized approach to analytical data is one of the cornerstones of AP Pension’s digital strategy…
The 2023 Ignite Book of News

2023年11月16日

The 2023 Ignite Book of News

With more than 100 announcements, Microsoft Ignite is pretty intense this year – and if you haven’t already seen Satya…

See all articles

Choosing between Lakehouse, Data Warehouse and Real-time Analytics in Microsoft Fabric.

Jacob R?nnow Jensen

Head of Data Platform @ AP Pension | Leadership | Data Delivery | Microsoft Data Platform | Business Intelligence | Digital Transformation | IT-Strategy | Data Warehousing | Data Architecture | Collaboration

领英推荐

Jacob R?nnow Jensen的更多文章

社区洞察

其他会员也浏览了

Building multi-purpose data warehouses

Why You Should Consider Adopting Apache Superset for Business Intelligence and Data Analytics

Modern Data Stack Components

Why the Unit Cost of Data Analysis Remains High and How TextQL's Ana, the AI Data Analyst Agent Can Help

Implementing Data Analytics Solutions with Azure Synapse Analytics

How to Create Snowflake Iceberg tables in using Coalesce.io ?.

The Future of Data Analytics: Unveiling Microsoft Fabric's Lakehouse

"Next Level Data Analytics: Meet Microsoft Fabric - Your Game-Changer for Data Insights!"

The BI Platform for the Modern Data Stack: Our Seed Investment in Lightdash

Streamlining Data Transfer from Data Verse to Power BI using Azure Synapse

领英推荐

Jacob R?nnow Jensen的更多文章

Going live with the AP Enterprise Fabric

Translating legacy code with GenAI

SQL Databases in Fabric?

Fabric Workspace design for automation and Data Delivery in AP Pension – Part 2

Fabric Workspace design for automation and data delivery in AP Pension – Part 1

Renaming Fabric SQL-endpoints in SSMS

Being responsible for data in OneLake

Introducing Mirroring in Fabric

Learning from the future as is emerges

The 2023 Ignite Book of News

社区洞察

其他会员也浏览了

Building multi-purpose data warehouses

Why You Should Consider Adopting Apache Superset for Business Intelligence and Data Analytics

Modern Data Stack Components

Why the Unit Cost of Data Analysis Remains High and How TextQL's Ana, the AI Data Analyst Agent Can Help

Implementing Data Analytics Solutions with Azure Synapse Analytics

How to Create Snowflake Iceberg tables in using Coalesce.io ?.

The Future of Data Analytics: Unveiling Microsoft Fabric's Lakehouse

"Next Level Data Analytics: Meet Microsoft Fabric - Your Game-Changer for Data Insights!"

The BI Platform for the Modern Data Stack: Our Seed Investment in Lightdash

Streamlining Data Transfer from Data Verse to Power BI using Azure Synapse