Choosing between Lakehouse, Data Warehouse and Real-time Analytics in Microsoft Fabric.

Choosing between Lakehouse, Data Warehouse and Real-time Analytics in Microsoft Fabric.

It is very easy to ingest data into Microsoft Fabric and start working with them - whether it being in a Lakehouse, Data Warehouse, KQL database or Datamart.

Although there is a lot of overlap between these offerings, there are also some subtle differences, that I feel are important to explore - and what better time than now, where Fabric is in Public Preview and a 60 day free trial is available.

In the article Lakehouse end-to-end scenario: overview and architecture it is described how an entire Data Delivery Platform - from the extraction of raw data through dimensional modelling - can be done using only Lakehouses. However, it is also noted, that “with the flexibility offered by Fabric, you can implement either Lakehouse or Data Warehouse architectures or combine these two together to get the best of both.”

3 scenarios for choosing between Data Warehouse, Lakehouse and Datamart are described in the article Fabric decision guide - lakehouse or data warehouse. The key differentiator here (as in the rest of the mainstream narrative on this subject) seems to be the skill set of the existing developer team - with a honorary mention of the possibility for multi-table transactions with Data Warehouse.

The newly introduced Datamarts in Power BI still exists under the Fabric license, and to muddy the waters even further, a fourth path, for ingesting and transforming data in your Data Delivery Platform, is Real-Time Analytics in Microsoft Fabric which through eventstreams and a KQL database offers some of the same capabilities as the Lakehouse and Warehouse.?(Below, I have lined up the suggested architectures from the Lakehouse and Real-time Analytics documentation respectively which I feel illustrates this point).

No alt text provided for this image

I feel, that Microsoft Fabric, with the unified experience, the open format on OneLake, and the rise of Co-Pilots and LLMs, offers us the ability to change the way we work with data and analysis - and that the optimal combination for ingesting, modelling and serving data in Fabric should depend on the specific use case, performance, price and governance model - not on current skills or what we did in the past.

For my own use in the Public Preview, I have compiled a comparison of capabilities between the Lakehouse, Data Warehouse, Real-time Analytic and PowerBI Datamart ... and I have highlighted the points, that sets them apart.

No alt text provided for this image

When working with data on an ad-hoc or exploratory basis, it can be a good idea to match the technology to the preferences of the persona working with it. Together with the Premium Per User license, this can make the case for the use of the PowerBI Datamart for Citizen Developers ... as the case can be made for offering the Data Lake for Data Scientists and the Data Warehouse for Data Analysts.

For Data Engineers, my suggestion right now is to start experimenting with the different paths and technologies and monitor the performance and limitations ... and to that end, I hope that the table above can give you an idea of where to start.

A few additional comments:

Currently the performance in the Lakehouse seems better than in the Warehouse, but this is to be expected since performance, concurrency and scale has not been the focus of the current release. ?

At the time of GA, I would also expect the Data Warehouse artifact to also support DirectLake, but for now, this can only be tested towards a Lakehouse.?

No alt text provided for this image

The support for multi-table transactions in the Data Warehouse Artifact could be a key factor when choosing how to implement your gold layer if you want to ensure, that related tables (i.e. in a star schema) are updated together or not updated at all. I also feel that one should take the warning from Better together - the lakehouse and warehouse seriously: "A SQL Endpoint is not scoped to data analytics in just the Fabric Lakehouse."

The Warehouse and SQL Endpoint share the same underlying processing architecture regardless of the artifact, but the Data Warehouse is the only one that supports "ingestion separation" where ingestion jobs can run on dedicated nodes that are optimized for ETL and do not compete with other queries or applications for resources.

OneLake shortcuts?can be used to create read-only replicas of tables in other workspaces to distribute load across multiple sql engines creating an isolation boundary.

As for security, data access has its own models depending on the engine, you are accessing. Before going ahead and implementing shortcuts, you should know, that when accessing them through Power BI Datasets or T-SQL,?the calling user’s identity is not passed through to the shortcut target.?The calling item owner’s identity is passed instead, delegating access to the calling user. Additionally, there is no automated process for cleaning up cascading shortcuts.

I feel that there is still a lot to be done to prevent access management and security from becoming total chaos, and I would love to be able to handle all data sharing, shortcuts and access control centrally in Purview - not just the ones mentioned in Administration, Security and Governance in Microsoft Fabric

BTW: if you agree, you can give my suggestion a vote or a comment in the ideas section of the Fabric community site.

No alt text provided for this image


?

Anshul Sharma

Product @ Microsoft

1 年

Hi Jacob - great article, I want to highlight that KQL databases now support querying shortcuts in delta format. There are other factors such as streaming ingestion latency, time series functionalities that might affect the decision. Please check this updated guidance - https://learn.microsoft.com/en-us/fabric/get-started/decision-guide-warehouse-lakehouse

要查看或添加评论,请登录

Jacob R?nnow Jensen的更多文章

  • Going live with the AP Enterprise Fabric

    Going live with the AP Enterprise Fabric

    Some of the most complex projects often start with a simple idea. Noel Yuhanna from Forrester came up with the term Big…

    3 条评论
  • Translating legacy code with GenAI

    Translating legacy code with GenAI

    Introduction Analytics has been a commercial battleground for decades in the financial sector, and the need for data…

    18 条评论
  • SQL Databases in Fabric?

    SQL Databases in Fabric?

    Introduction Much like a thermos that can keep both hot liquids hot and cold liquids cold (but not at the same time)…

    6 条评论
  • Fabric Workspace design for automation and Data Delivery in AP Pension – Part 2

    Fabric Workspace design for automation and Data Delivery in AP Pension – Part 2

    In Part 1 of this article, I discussed the approach, we in AP Pension have taken to workspace design for data delivery…

    19 条评论
  • Fabric Workspace design for automation and data delivery in AP Pension – Part 1

    Fabric Workspace design for automation and data delivery in AP Pension – Part 1

    In AP Pension, we have been working some time on building a modern data platform, AP Data, with Microsoft Fabric and…

    4 条评论
  • Renaming Fabric SQL-endpoints in SSMS

    Renaming Fabric SQL-endpoints in SSMS

    One of the many qualities of OneLake in Microsoft Fabric is, that it automatically builds SQL Analytics Endpoints for…

    1 条评论
  • Being responsible for data in OneLake

    Being responsible for data in OneLake

    “OneLake is the OneDrive for your data!”. As a user, I understand what it means, and I have seen (and done) countless…

    2 条评论
  • Introducing Mirroring in Fabric

    Introducing Mirroring in Fabric

    At the first anual Microsoft Fabric Community Conference, Microsoft announced that Mirroring is now in Public Preview…

    6 条评论
  • Learning from the future as is emerges

    Learning from the future as is emerges

    A structured and democratized approach to analytical data is one of the cornerstones of AP Pension’s digital strategy…

  • The 2023 Ignite Book of News

    The 2023 Ignite Book of News

    With more than 100 announcements, Microsoft Ignite is pretty intense this year – and if you haven’t already seen Satya…

社区洞察

其他会员也浏览了