DBT in Microsoft Fabric: MY JOURNAL
Update 20231114: Git Installs on Runtime 1.2 (link) - thanks venkims fromMS.
Update 20231108: Change runtime version in Spark Setting from 1.2 to 1.1 (Runtime version 1.2 does not have git).
Update 20230930: Remove docker and azure parts.
Init 20230627: Publication date.
Hi, my name is Nghia. This is my journal on how I applied Microsoft Fabric combined with dbt to a project in preparation for new customers in the future.
As I work with data, I always focus on the pattern: What - Why - How.
What
Microsoft Fabric is a new unified data platform by Microsoft that provides everything we need to build a Data System. We can build a Data Warehouse inside Fabric.
Dbt is a data transformation workflow that helps you get more work done while producing higher-quality results.
I want to combine both of them together, as I believe it will have a positive impact.
Why
First, when I recently did a workshop about Fabric, it wasn't very practical. I tried live coding and explaining, but it didn't work well. Not many people understood what I was saying. Maybe my explanation was not clear enough. I am down-to-earth, so I need down-to-earth solutions: writing it down.
Second, I want to share my knowledge with people around the world. Even if nobody reads this article, I write it for myself.
HOW
In this part, I will explain how I combine Fabric and dbt using my code and experience. In building a Data Warehouse, there is always a lot of SQL code to manage. Dbt has become a great tool for managing SQL pipelines worldwide. Unfortunately, dbt is not fully compatible with Fabric at the moment. The main reason is that Fabric is new and in preview mode. Sometimes I face challenges because Fabric lacks some features in Data Factory Service, Notebook, and Data Warehouse. Additionally, dbt Cloud does not support Fabric, so I have to build many things myself.
This article shares my experience of using dbt step by step in Microsoft Fabric. Before we continue, please be aware that this article assumes the reader knows:
Overview
There are some key components in this solution:
Some highlights of this solution:
Let's go through step by step:
Build Warehouse Sample
If you are already in the Fabric Preview because you are using Power BI like me, then it is easy to build a warehouse sample. Alternatively, you can try it through the provided link. After that, we create a new Workspace in License mode = Trial Fabric. Let's call it "DBTWS".
"This is my favorite part: I can build a Synap Data Warehouse with just one click. As a computer geek, I'm always eager to use it. Here's how you can do it too:
Sure, take your time and enjoy a cup of coffee while the Fabric creates the warehouse for us. This process may take a few moments. Once the warehouse is ready, we can proceed to use the sample database. Keep an eye on the screen as it may display a message saying "Loading sample data. This may take a few minutes...". Sometimes it can take around 10 to 20 minutes to complete. If it takes longer, feel free to have another cup of coffee to keep you refreshed. Once the sample data is loaded, we'll be ready to dive into the exciting features of the Synap Data Warehouse
Now that you have the data in the date table, you can begin exploring and analyzing it as you need it.
To retrieve the connection string, I performed the following steps:
领英推荐
Simple DBT project.
Before we continue, please be aware that this article assumes that the reader has already completed the following steps::
If you have never used dbt (data build tool) before, I recommend going through my previous article where I have created a sample project. It will give you a good understanding of the basics. For this sample project, I aim to keep it even simpler.
Here are the steps to initialize the dbt project and choose the "fabric" option:
SELECT TOP 1 * FROM Date
4. Add a new file profiles.yml; you will enter your info by youseft. After that, you will see our dbt project like this:
dockerdbt:
? target: dev
? outputs:
? ? dev:
? ? ? type: fabric
? ? ? driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system)
? ? ? server: 'xxx' ? ? ? ? ? ? ? ? ? ? ? ? ? # Connect String which I got before
? ? ? port: 1433 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?# Default port = 1433
? ? ? database: WH ? ? ? ? ? ? ? ? ? ? ? ? ? ?# Warehouse Name
? ? ? schema: dbo ? ? ? ? ? ? ? ? ? ? ? ? ? ? # Schema Name
? ? ? authentication: ActiveDirectoryPassword # Authen by AD User & Pass
? ? ? user: xxx ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # Username
? ? ? password: xxx ? ? ? ? ? ? ? ? ? ? ? ? ? # Password
5. Finally, I should run some codes to debug and build my Project. By that way, I can confirm my dbt project worked.
Push Code to GitHub
I used GitHub to store my project and Azure Studio Code made it supper easy.
Build Data Factory - Pipeline - Notebook
Moving to Microsoft Fabric, I created a new NoteBook to pull code from GitHub and run the dbt project.
Explanation:
Step 1: Install dbt-fabric in Notebook.
Step 2: Clone the dbt project in GitHub.
Step 3: Run the project.
Step1:
pip install -U dbt-fabric
Step2:
%%bash
git clone "https://{PrivateKey}@github.com/trantrongw/SanpleDBT2Fabric.git" "SanpleDBT2Fabric" 2> /dev/null || (cd "SanpleDBT2Fabric" ; git pull)
Step3:
%%bash
cd SanpleDBT2Fabric/Sample
dbt debug
Create a pipeline then add the notebook for automation. (1) Add notebook to canvas then (2) Config the notebook (3) Save
Git Installs on Runtime 1.2
1) Create a new YAML file with the following content. Name this file as "update_env.yml"
name: Fabric1.2_GitWorkaround
channels:
? - conda-forge
? - defaults
dependencies:
? - git
2) Create a new environment from Data Science or Data Engineering
Name the environment something like "Fabric1.2WithGit".
3) In the new?"Fabric1.2WithGit", under "public libraries", select "add from .yml"
4) Import "update_env.yml" file created in step 1.
5) Select the public library "git", click "Save" and then "Publish". Please be patient, it will take a few min to create the environment. Ensure the environment is published successfully (notification will be delivered).
6) Under the notebook, select "Fabric1.2WithGit", you will have git working
Conclusion
I am so proud of myself. This is all relatively new to me and I’m still learning about Python and Notebooks, so if you see anything where I’m not doing things the “right way” feel free to point them out