Generating a Date Dimension Table for Direct Lake Model in Microsoft Fabric
Romain Casteres
Cloud Solution Architect @ Microsoft | Unlocking Customer Data Value | Learn & Share
One of the key elements of a semantic model is the Date table. It provides a reference for all time-related calculations and analyses. A date table is a table that contains a row for each day in a specific range and additional columns that describe the attributes of the date, such as year, month, quarter, week, day name, ...
A date table is essential for the use of Time Intelligence functions in DAX. These functions enable you to perform calculations based on periods, such as year-to-date, quarter-to-date, month-to-date, etc.
A date table also enables the use of relative date Filters and Slicers in Power BI reports. Users can easily filter data by dynamic periods, such as the last 5 days, the next 6 months, the previous year, etc. These functions rely on the date table to provide the context and granularity of the time dimension.
Power BI lets you create several physical relationships between two tables. Of these relationships, only one is active, while the others are inactive (represented by a dotted line). Active relationships propagate filters by default, so the choice of active relationships is very important and will influence the possible analyses. Role-playing dimensions are dimensions that are used several times in a fact table. In the following example, the Sales table contains several dates, such as the Invoice Date Key and the Delivery Date Key, both of which can be linked to the Date column in the Date table. It is therefore possible to analyze sales according to invoice date or delivery date. For more information on activating DAX relationships: https://learn.microsoft.com/power-bi/guidance/relationships-active-inactive
A date dimension can be created manually, imported from an external source or generated using the DAX or M functions. It is often advisable to perform as many operations as possible upstream, which is why it is preferable to create this date dimension in Power Query rather than in DAX. This way, even if the table isn't very large, its compression will be optimized.
Here are some examples of how to create this Date table: https://learn.microsoft.com/power-bi/guidance/model-date-tables
I've long been using the Date dimension generated via a function created by Chris Webb: Generating A Date Dimension Table In Power Query.
However, Direct Lake does not support the use of Power Query, or even Composite Models >>> All tables must first be available in OneLake in Delta format.
One option would be to create tables in M in Dataflow Gen 2, while another approach would be to use Python in a Notebook in Microsoft Fabric.
So here's how to implement the latter solution:
Once you've manually attached a Lakehouse to a Notebook or in Python as explained in Sandeep Pawar's article : How to Attach a Default Lakehouse to a Notebook in Fabric, here's an example script that illustrates the creation of a Date dimension in the associated Lakehouse:
领英推荐
import pandas as pd
import holidays ? ? #https://pypi.org/project/holidays/
def DimDate(start_date, end_date):
? from pandas.tseries.offsets import MonthEnd, QuarterEnd
? df_date = pd.DataFrame({"Date": pd.date_range(start=f'{start_date}', end=f'{end_date}', freq='D')})
? def get_end_of_month(pd_date):
? ? ? if pd_date.is_month_end == True:
? ? ? ? ? return pd_date
? ? ? else:
? ? ? ? ? return pd_date + MonthEnd(1)
? def get_end_of_quarter(pd_date):
? ? ? if pd_date.is_quarter_end == True:
? ? ? ? ? return pd_date
? ? ? else:
? ? ? ? ? return pd_date + QuarterEnd(1)
? df_date["Day"] = df_date.Date.dt.day
? df_date["Week"] = df_date.Date.dt.weekday
? df_date["Month"] = df_date.Date.dt.month
? df_date["MonthName"] = df_date.Date.dt.month_name()
? df_date["Quarter"] = df_date.Date.dt.quarter
? df_date["Year"] = df_date.Date.dt.year
? df_date["FiscalYear"] = df_date['Date'].dt.to_period('A-JUN')
? df_date['EndOfMonth'] = df_date['Date'].apply(get_end_of_month)
? df_date['EOM'] = df_date['Date'].dt.is_month_end
? df_date['EndOfQuarter'] = df_date['Date'].apply(get_end_of_quarter)
? df_date['EOQ'] = df_date['Date'].dt.is_quarter_end
? return df_date
def DimDateWorkday(start_date, end_date):
? df_date = DimDate(start_date, end_date)
? df_date['Workday'] = True
? for index, row in df_date.iterrows():
? ? ? if row['Day'] in ['Saturday', 'Sunday']:
? ? ? ? ? df_date.loc[index, 'Workday'] = False
? ? ? date = row['Date'].strftime("%Y-%m-%d")
? ? ? if date in holidays.France():
? ? ? ? ? df_date.loc[index, 'Workday'] = False
? return df_date
#DimDate('2024-01-01', '2025-12-31')
#DimDateWorkday('2024-01-01', '2025-12-31')
df_DimDate = spark.createDataFrame(DimDateWorkday('2024-01-01', '2025-12-31'))
df_DimDate.write.mode("overwrite").format("delta").saveAsTable('Calandar')
I'm using the Python Holidays library, which lets you add working days and days based on different countries (in this case, France). Feel free to consult the library's documentation: https://pypi.org/project/holidays/.
The Notebook is available at the following address: DateDimensionTable.ipynb
Result:
Once executed, the Calendar table will be present in the Lakehouse, accessible via the SQL Endpoint and also available for your various semantic models in Direct Lake. Once the Calendar table has been created, you can mark it as the model's Date table, on a semantic model by non Default (Direct Lake mode and Power BI reports):
It will then be possible to create DAX measures using Time Intelligence functions to calculate, for example, the cumulative total for the current year: https://learn.microsoft.com/dax/time-intelligence-functions-dax
On top of the model created during my previous blog article on Monitoring the quality of Power BI Semantics Models over time | LinkedIn
BP Issues YTD = TOTALYTD ([BP Issues], calandar[Date])
Leading data & analytics digital transformation program ?? at Royal Den Hartogh Logistics ??(Contractor) | Manager BI ??♀? | PSPO2 | PSM1 | SAFe | MCSA Power BI
2 个月Michael Doves Winfred Zwaard
Analytics Engineer | BI Consultant | Power BI Expert | Data Enthusiast | Microsoft Certified: Fabric Analytics Engineer Associate | Power BI Data Analyst Associate | Azure Data Engineer Associate
2 个月Thanks for sharing Romain Casteres! very useful
Senior Data Engineer at TMX Transform | Data Analysis | Data Engineering | Data Warehouse | Data Lake
2 个月It’s an interesting choice by Fabric, to use Python and not Power Query… and yes, like a chocolate cake, we could always have another script for Date table
Founder @Data Pulse | Microsoft MVP
2 个月Great article and interesting PySpark + notebook use case. Thanks Romain ?? Good news is that once this table is created in the Fabric Lakehouse, we can create multiple shortcuts in order to reuse this Calendar table as much as we want without duplicating the data ??
Cloud Solution Architect @ Microsoft | Unlocking Customer Data Value | Learn & Share
2 个月Chris Webb, Sandeep Pawar, Michael Kovalsky