登录查看更多内容

Microsoft Fabric Storage Cost Optimization using a Storage Lifecycle Strategy

Nadim Abou-Khalil

Fabric and Power BI Consultant | YouTube @Power?gg

发布日期: 2024年12月20日

Currently, Microsoft Fabric storage costs are linear. There is no tier-based storage, and it is highly available at all times. This means it can be said that from a cost perspective, when looking at highly growing storage over time, it is not sustainable.

This poses a problem, costs will grow as data will grow, without the possibility to archive or lower the storage tiers.

However, there is a concept we developed that can help you with this problem.

What if you focus on a compression strategy, rather than depending on a storage tier update? Let me explain.

Say your organization mostly accesses data from the last two years, making YoY KPI analyses, and accesses data from the last two years regularly. In that case, we could say that any data from before two years can be stored differently, as it has different requirements. Older data needs to be accessed less frequently, not in real time and certainly not be updated on a daily basis. This means that we can make a compromise.

Microsoft Fabric offers various compression formats; Snappy, ZSTD, Gzip, LZO.

Looking at these, and taking into account Power BI compatibility, Snappy is compatible and is optimized for read and write. It is balanced between compression and speed, lightweight and efficient for read-heavy workloads. It also has native support in Power BI for reading Delta Parquet files. Finally, it has low CPU usage, making it optimal for real-time or interactive workloads. So, for example, Direct Lake or Direct Query would work great with Snappy. ?Snappy has a medium compression ratio.

Let's look at ZSTD. It is a modern compression algorithm, it has relatively fast decompression and relatively high compression ratio. This makes this compression the most cost effective for storage. However, this compression type is only partially supported by Power BI. This compression is preferred for large datasets that needs to be queried less frequently. It makes this a good option for import mode in Power BI, as this is also the only compatible way to query it using Power BI. For example, during testing I had issues accessing this compression type in Direct Lake mode. However Direct Query and Import mode seem to work (see below for proof of concept).

Looking at the summary below, other compression types are not relevant for the use case, as they are either not compatible with Power BI or have a low compression ratio.

Here is a summary table of the compression types:

So, what does this mean? Can we combine compression types – using different compression for older data with different requirements? Yes. We can. We can combine for example ZSTD, which has a high compression ratio and medium speed and CPU usage, with Snappy for current data that is needed frequently and in real-time or near real-time. But how will this work?

1.)??? Snappy for current data:

Use a notebook in Fabric that updates a Delta table to only include the data from the last 2 years, updating every day with new data, and respectively moving the older than 2 years data into a different Delta table. Then execute the vacuum command to remove deleted files. Saving storage. This data will be queried using direct query mode in Microsoft Power BI. It can also be queried using Import mode if data is not necessary in the most up to date state.

领英推荐

Using Azure Blob Storage for Large-Scale Data Storage

Rangaraj Balakrishnan 5 个月前

RisingWave Newsletter July 2024

RisingWave 7 个月前

Azure Connections in Microsoft Fabric Admin Portal

Abiola A. David, MSc, MVP 10 个月前

2.)??? ZSTD for older data:

The data that is removed from the Snappy delta table gets appended to a separate ZSTD compressed delta table that serves as a table for older data. This table is accessed in Power BI via import mode, due to compatibility limitations. This reduces the size to a minimum for data older than 2 years.

3.)??? Composite Model in Power BI:

Now imagine users want access to all data. You can create a composite model in Power BI, by creating a union query further upstream, such as in the SQL Endpoint as a view. You can now in Power BI import data older than 2 years, and direct query data within the last two years. This ensures compatibility.

Here is an example of a test using the different compression types for the same source parquet file:

You can see ZSTD uses about 60mb of storage for a delta table with about 12.5m rows.

You can see snappy uses about 72.5mb of storage for a delta table with about 12.5m rows.

How it looks in Power BI:

Proof of concept for the tables visualized in Power BI:

The final concept result is that we would optimize storage and costs for data older than 2 years, compress yet meet requirements for current data. On the long run this will pay off. You will be saving costs on storage, and make sure that data is still accessible. Imagine at some point Microsoft releases an update where tier-based storage is possible in Fabric. You could combine the compression strategy with the tier-based storage, optimizing even further. This would enhance your ability to save costs in the short and long term.

要查看或添加评论，请登录

Nadim Abou-Khalil的更多文章

Microsoft Fabric: Evolving from Descriptive Insights to AI-Driven Decisions

2025年3月12日

Microsoft Fabric: Evolving from Descriptive Insights to AI-Driven Decisions

The Evolution of Data Analytics In today's data-driven business landscape, organizations are increasingly recognizing…
Unlocking Enterprise AI: Microsoft Fabric's Integrated AI Functions

2025年3月1日

Unlocking Enterprise AI: Microsoft Fabric's Integrated AI Functions

Transforming Data Workflows with Native AI Capabilities Microsoft Fabric is elevating its enterprise analytics platform…

2 条评论
Microsoft Fabric AI Skill: A Transformative Addition to Enterprise Analytics

2025年2月26日

Microsoft Fabric AI Skill: A Transformative Addition to Enterprise Analytics

The Dawn of Integrated AI in Microsoft Fabric Microsoft Fabric is poised for a significant evolution in Q1 2025 with…

5 条评论
Optimizing Spark in Microsoft Fabric: A Guide to Table Size-Based Strategies

2025年2月15日

Optimizing Spark in Microsoft Fabric: A Guide to Table Size-Based Strategies

With Spark runtime 1.3 in Microsoft Fabric notebooks, we now have access to the CLUSTER BY option, adding to our…
Microsoft Fabric vs. SAP Business Data Cloud: Key Differences

2025年2月13日

Microsoft Fabric vs. SAP Business Data Cloud: Key Differences

Big news in the data world! Today, Databricks announced a partnership with SAP, bringing significant improvements to…

16 条评论
Making the Right Choice: Low-Code vs Code-First in Microsoft Fabric

2025年2月9日

Making the Right Choice: Low-Code vs Code-First in Microsoft Fabric

Are you building your next data solution on Microsoft Fabric? One of the most crucial decisions you'll face is choosing…
5 Game-Changing Hidden Gems in Power BI for Advanced Reporting

2025年2月4日

5 Game-Changing Hidden Gems in Power BI for Advanced Reporting

After working with Microsoft Power BI for over a decade, I’ve discovered several underutilized features that can…

2 条评论
Understanding Microsoft Fabric Security: Navigating the Complex Landscape of Data Access Control

2025年1月28日

Understanding Microsoft Fabric Security: Navigating the Complex Landscape of Data Access Control

In today's data-driven enterprise environment, securing sensitive information while maintaining accessibility is…
Microsoft Fabric Data Engineer Associate (DP-700) Study Guide

2025年1月22日

Microsoft Fabric Data Engineer Associate (DP-700) Study Guide

Having successfully passed the DP-700 exam during its beta phase, I wanted to create a comprehensive study guide to…

7 条评论
The Hidden Costs of "Code First, Optimize Later" in Data Engineering

2025年1月16日

The Hidden Costs of "Code First, Optimize Later" in Data Engineering

In the world of data engineering, particularly with Microsoft Fabric, there's a common approach: "Code first, make it…

2 条评论

See all articles

Microsoft Fabric Storage Cost Optimization using a Storage Lifecycle Strategy

Nadim Abou-Khalil

Fabric and Power BI Consultant | YouTube @Power?gg

领英推荐

Nadim Abou-Khalil的更多文章

社区洞察

其他会员也浏览了

Microsoft Fabric Costs and Capacity Management

Making the Leap: Upgrading from Legacy Azure Services to Azure Fabric

Let's talk about Consistency

Accessing data stored in Azure Data Lake Storage (ADLS) and Azure Blob Storage

Azure Blob Storage aka Azure Un-structured Data Service

Simplifying Data Storage with Amazon Simple Storage Service (S3)

Azure blob storage cost optimization techniques

Azure cost analysis alternatives

Edge Databases - a new Data Architecture for the Edge

Cost Management Strategies for Azure Blob Storage

领英推荐

Nadim Abou-Khalil的更多文章

Microsoft Fabric: Evolving from Descriptive Insights to AI-Driven Decisions

Unlocking Enterprise AI: Microsoft Fabric's Integrated AI Functions

Microsoft Fabric AI Skill: A Transformative Addition to Enterprise Analytics

Optimizing Spark in Microsoft Fabric: A Guide to Table Size-Based Strategies

Microsoft Fabric vs. SAP Business Data Cloud: Key Differences

Making the Right Choice: Low-Code vs Code-First in Microsoft Fabric

5 Game-Changing Hidden Gems in Power BI for Advanced Reporting

Understanding Microsoft Fabric Security: Navigating the Complex Landscape of Data Access Control

Microsoft Fabric Data Engineer Associate (DP-700) Study Guide

The Hidden Costs of "Code First, Optimize Later" in Data Engineering

社区洞察

其他会员也浏览了

Microsoft Fabric Costs and Capacity Management

Making the Leap: Upgrading from Legacy Azure Services to Azure Fabric

Let's talk about Consistency

Accessing data stored in Azure Data Lake Storage (ADLS) and Azure Blob Storage

Azure Blob Storage aka Azure Un-structured Data Service

Simplifying Data Storage with Amazon Simple Storage Service (S3)

Azure blob storage cost optimization techniques

Azure cost analysis alternatives

Edge Databases - a new Data Architecture for the Edge

Cost Management Strategies for Azure Blob Storage