Generating a Date Dimension Table for Direct Lake Model in Microsoft Fabric

Generating a Date Dimension Table for Direct Lake Model in Microsoft Fabric

One of the key elements of a semantic model is the Date table. It provides a reference for all time-related calculations and analyses. A date table is a table that contains a row for each day in a specific range and additional columns that describe the attributes of the date, such as year, month, quarter, week, day name, ...

A date table is essential for the use of Time Intelligence functions in DAX. These functions enable you to perform calculations based on periods, such as year-to-date, quarter-to-date, month-to-date, etc.

A date table also enables the use of relative date Filters and Slicers in Power BI reports. Users can easily filter data by dynamic periods, such as the last 5 days, the next 6 months, the previous year, etc. These functions rely on the date table to provide the context and granularity of the time dimension.

Power BI lets you create several physical relationships between two tables. Of these relationships, only one is active, while the others are inactive (represented by a dotted line). Active relationships propagate filters by default, so the choice of active relationships is very important and will influence the possible analyses. Role-playing dimensions are dimensions that are used several times in a fact table. In the following example, the Sales table contains several dates, such as the Invoice Date Key and the Delivery Date Key, both of which can be linked to the Date column in the Date table. It is therefore possible to analyze sales according to invoice date or delivery date. For more information on activating DAX relationships: https://learn.microsoft.com/power-bi/guidance/relationships-active-inactive

Role-playing dimensions

A date dimension can be created manually, imported from an external source or generated using the DAX or M functions. It is often advisable to perform as many operations as possible upstream, which is why it is preferable to create this date dimension in Power Query rather than in DAX. This way, even if the table isn't very large, its compression will be optimized.

Here are some examples of how to create this Date table: https://learn.microsoft.com/power-bi/guidance/model-date-tables

I've long been using the Date dimension generated via a function created by Chris Webb: Generating A Date Dimension Table In Power Query.

However, Direct Lake does not support the use of Power Query, or even Composite Models >>> All tables must first be available in OneLake in Delta format.

One option would be to create tables in M in Dataflow Gen 2, while another approach would be to use Python in a Notebook in Microsoft Fabric.

So here's how to implement the latter solution:

Once you've manually attached a Lakehouse to a Notebook or in Python as explained in Sandeep Pawar's article : How to Attach a Default Lakehouse to a Notebook in Fabric, here's an example script that illustrates the creation of a Date dimension in the associated Lakehouse:

import pandas as pd
import holidays ? ? #https://pypi.org/project/holidays/
def DimDate(start_date, end_date):
  ? from pandas.tseries.offsets import MonthEnd, QuarterEnd
  ? df_date = pd.DataFrame({"Date": pd.date_range(start=f'{start_date}', end=f'{end_date}', freq='D')})
  ? def get_end_of_month(pd_date):
  ? ? ? if pd_date.is_month_end == True:
  ? ? ? ? ? return pd_date
  ? ? ? else:
  ? ? ? ? ? return pd_date + MonthEnd(1)
  ? def get_end_of_quarter(pd_date):
  ? ? ? if pd_date.is_quarter_end == True:
  ? ? ? ? ? return pd_date
  ? ? ? else:
  ? ? ? ? ? return pd_date + QuarterEnd(1)
  ? df_date["Day"] = df_date.Date.dt.day
  ? df_date["Week"] = df_date.Date.dt.weekday
  ? df_date["Month"] = df_date.Date.dt.month
  ? df_date["MonthName"] = df_date.Date.dt.month_name()
  ? df_date["Quarter"] = df_date.Date.dt.quarter
  ? df_date["Year"] = df_date.Date.dt.year
  ? df_date["FiscalYear"] = df_date['Date'].dt.to_period('A-JUN')
  ? df_date['EndOfMonth'] = df_date['Date'].apply(get_end_of_month)
  ? df_date['EOM'] = df_date['Date'].dt.is_month_end
  ? df_date['EndOfQuarter'] = df_date['Date'].apply(get_end_of_quarter)
  ? df_date['EOQ'] = df_date['Date'].dt.is_quarter_end
  ? return df_date

def DimDateWorkday(start_date, end_date):
  ? df_date = DimDate(start_date, end_date)
  ? df_date['Workday'] = True
  ? for index, row in df_date.iterrows():
  ? ? ? if row['Day'] in ['Saturday', 'Sunday']:
  ? ? ? ? ? df_date.loc[index, 'Workday'] = False
  ? ? ? date = row['Date'].strftime("%Y-%m-%d")
  ? ? ? if date in holidays.France():
  ? ? ? ? ? df_date.loc[index, 'Workday'] = False
  ? return df_date

#DimDate('2024-01-01', '2025-12-31')
#DimDateWorkday('2024-01-01', '2025-12-31')
df_DimDate = spark.createDataFrame(DimDateWorkday('2024-01-01', '2025-12-31'))
df_DimDate.write.mode("overwrite").format("delta").saveAsTable('Calandar')        

I'm using the Python Holidays library, which lets you add working days and days based on different countries (in this case, France). Feel free to consult the library's documentation: https://pypi.org/project/holidays/.

The Notebook is available at the following address: DateDimensionTable.ipynb

Result:

Once executed, the Calendar table will be present in the Lakehouse, accessible via the SQL Endpoint and also available for your various semantic models in Direct Lake. Once the Calendar table has been created, you can mark it as the model's Date table, on a semantic model by non Default (Direct Lake mode and Power BI reports):

It will then be possible to create DAX measures using Time Intelligence functions to calculate, for example, the cumulative total for the current year: https://learn.microsoft.com/dax/time-intelligence-functions-dax

On top of the model created during my previous blog article on Monitoring the quality of Power BI Semantics Models over time | LinkedIn

BP Issues YTD = TOTALYTD ([BP Issues], calandar[Date])        


Lohic Beneyzet-Jouy

Leading data & analytics digital transformation program ?? at Royal Den Hartogh Logistics ??(Contractor) | Manager BI ??♀? | PSPO2 | PSM1 | SAFe | MCSA Power BI

2 个月
回复
Amal BEN REBAI

Analytics Engineer | BI Consultant | Power BI Expert | Data Enthusiast | Microsoft Certified: Fabric Analytics Engineer Associate | Power BI Data Analyst Associate | Azure Data Engineer Associate

2 个月

Thanks for sharing Romain Casteres! very useful

Hila Galapo

Senior Data Engineer at TMX Transform | Data Analysis | Data Engineering | Data Warehouse | Data Lake

2 个月

It’s an interesting choice by Fabric, to use Python and not Power Query… and yes, like a chocolate cake, we could always have another script for Date table

Tristan Malherbe

Founder @Data Pulse | Microsoft MVP

2 个月

Great article and interesting PySpark + notebook use case. Thanks Romain ?? Good news is that once this table is created in the Fabric Lakehouse, we can create multiple shortcuts in order to reuse this Calendar table as much as we want without duplicating the data ??

Romain Casteres

Cloud Solution Architect @ Microsoft | Unlocking Customer Data Value | Learn & Share

2 个月

要查看或添加评论,请登录

Romain Casteres的更多文章

  • Microsoft Fabric - Google BigQuery

    Microsoft Fabric - Google BigQuery

    In today's data-driven world, organizations often find themselves with vast amounts of data scattered across numerous…

    19 条评论
  • Monitoring Microsoft Fabric Copilot Adoption

    Monitoring Microsoft Fabric Copilot Adoption

    Copilot for Microsoft Fabric offers a range of features designed to enrich the way you work with data and bring new…

    4 条评论
  • Monitoring the quality of Power BI Semantics Models over time

    Monitoring the quality of Power BI Semantics Models over time

    Introduction More than 3 years ago I wrote this article Power BI Best Practice Analyzer on analyzing the quality of all…

    19 条评论
  • Analyzing Solar Panel production and home energy consumption in real time with Microsoft Fabric

    Analyzing Solar Panel production and home energy consumption in real time with Microsoft Fabric

    Introduction Microsoft Fabric is a cloud-based platform that lets you, among other Data Activity, stream, analyze, and…

    6 条评论
  • Le DataOps avec Power BI et Microsoft Fabric

    Le DataOps avec Power BI et Microsoft Fabric

    Petit article de retour sur un événement communautaire du Club Power BI Les communautés sont des groupes de personnes…

    8 条评论
  • Analyse de la tempête Ciaran avec Microsoft Fabric

    Analyse de la tempête Ciaran avec Microsoft Fabric

    Toujours à la recherche de données Open Source afin de mettre en exergue les capacités des solutions Analytics…

    7 条评论
  • Supervise the use of Microsoft Fabric

    Supervise the use of Microsoft Fabric

    Already one month since the official announcement of Microsoft Fabric and his public Preview: https://azure.microsoft.

    1 条评论
  • Microsoft Fabrics Monitoring Tenant Settings

    Microsoft Fabrics Monitoring Tenant Settings

    I am sure you have heard about all great announcements from last Microsoft Build. If it’s not the case, most of them…

    28 条评论
  • Power BI Dataset Scale-Out

    Power BI Dataset Scale-Out

    1 - Introduction Power BI has evolved into the leading platform for both self-service and IT-managed enterprise…

    1 条评论
  • DataOps for Power BI

    DataOps for Power BI

    Have you ever overwritten a Power BI report in the service before realizing that you haven't edited the latest version…

    32 条评论

社区洞察

其他会员也浏览了