DBT in Microsoft Fabric: MY JOURNAL

DBT in Microsoft Fabric: MY JOURNAL

Update 20231114: Git Installs on Runtime 1.2 (link) - thanks venkims fromMS.

Update 20231108: Change runtime version in Spark Setting from 1.2 to 1.1 (Runtime version 1.2 does not have git).

Update 20230930: Remove docker and azure parts.

Init 20230627: Publication date.


Hi, my name is Nghia. This is my journal on how I applied Microsoft Fabric combined with dbt to a project in preparation for new customers in the future.

As I work with data, I always focus on the pattern: What - Why - How.

What

Microsoft Fabric is a new unified data platform by Microsoft that provides everything we need to build a Data System. We can build a Data Warehouse inside Fabric.

Dbt is a data transformation workflow that helps you get more work done while producing higher-quality results.

I want to combine both of them together, as I believe it will have a positive impact.

Why

First, when I recently did a workshop about Fabric, it wasn't very practical. I tried live coding and explaining, but it didn't work well. Not many people understood what I was saying. Maybe my explanation was not clear enough. I am down-to-earth, so I need down-to-earth solutions: writing it down.

Second, I want to share my knowledge with people around the world. Even if nobody reads this article, I write it for myself.

HOW

In this part, I will explain how I combine Fabric and dbt using my code and experience. In building a Data Warehouse, there is always a lot of SQL code to manage. Dbt has become a great tool for managing SQL pipelines worldwide. Unfortunately, dbt is not fully compatible with Fabric at the moment. The main reason is that Fabric is new and in preview mode. Sometimes I face challenges because Fabric lacks some features in Data Factory Service, Notebook, and Data Warehouse. Additionally, dbt Cloud does not support Fabric, so I have to build many things myself.

This article shares my experience of using dbt step by step in Microsoft Fabric. Before we continue, please be aware that this article assumes the reader knows:

  • How to use dbt and already has their dbt project set up and ready to use.
  • You are trying to use Microsoft Fabric and are familiar with it.

Overview


There are some key components in this solution:

  1. Dbt Project: Develop a Dbt project locally and build a CI/CD in Github.
  2. Data Warehouse: It is built using the Warehouse sample.
  3. In the Notebook, I pull code from Git Hub and run the dbt project.

Some highlights of this solution:

  • Microsoft Fabric: It provides all I need to build a data platform.
  • Cost-effective: The cost to run Microsoft Fabric and dbt-core is free.

Let's go through step by step:

Build Warehouse Sample

If you are already in the Fabric Preview because you are using Power BI like me, then it is easy to build a warehouse sample. Alternatively, you can try it through the provided link. After that, we create a new Workspace in License mode = Trial Fabric. Let's call it "DBTWS".

No alt text provided for this image

"This is my favorite part: I can build a Synap Data Warehouse with just one click. As a computer geek, I'm always eager to use it. Here's how you can do it too:

  1. In the bottom left corner, switch to Data Warehouse.
  2. Click on 'New Warehouse (preview)'.
  3. Fill in the required 'Name*' field. I named mine 'WH'. You can refer to the image below for a visual representation of the steps.

No alt text provided for this image

Sure, take your time and enjoy a cup of coffee while the Fabric creates the warehouse for us. This process may take a few moments. Once the warehouse is ready, we can proceed to use the sample database. Keep an eye on the screen as it may display a message saying "Loading sample data. This may take a few minutes...". Sometimes it can take around 10 to 20 minutes to complete. If it takes longer, feel free to have another cup of coffee to keep you refreshed. Once the sample data is loaded, we'll be ready to dive into the exciting features of the Synap Data Warehouse

No alt text provided for this image

Now that you have the data in the date table, you can begin exploring and analyzing it as you need it.

No alt text provided for this image

To retrieve the connection string, I performed the following steps:

  1. Clicked on the three dots (ellipsis) in the DW.
  2. A new menu tooltip appeared on the screen.
  3. Selected "Copy SQL connection string" from the menu.
  4. A new dialog box appeared, where you clicked on the "Copy" button.
  5. You pasted the copied connection string into a note


No alt text provided for this image

Simple DBT project.

Before we continue, please be aware that this article assumes that the reader has already completed the following steps::

  • Installed Python (version 3.9 or 3.10). You can find the installation Here
  • Installed Git, preferably the latest version. You can find the installation Here
  • Installed the ODBC driver version 18 for Windows. Make sure you have the appropriate driver installed for your database. You can find the download Here
  • Installed dbt-fabric, which is a dbt adapter for Fabric Data Warehouse. You can find more information and installation instructions Here
  • Installed Visual Studio Code, a popular code editor. You can download it from the official website here.

If you have never used dbt (data build tool) before, I recommend going through my previous article where I have created a sample project. It will give you a good understanding of the basics. For this sample project, I aim to keep it even simpler.

Here are the steps to initialize the dbt project and choose the "fabric" option:

  1. Open your terminal or command prompt and navigate to the desired directory where you want to create your dbt project. Run the following command to initialize a new dbt project: dbt init then pick the fabric adapter.
  2. Delete 2 default files: my_second_dbt_model.sql, schema.yml.
  3. Edit my_first_dbt_model.sql:

SELECT TOP 1 * FROM Date        

4. Add a new file profiles.yml; you will enter your info by youseft. After that, you will see our dbt project like this:

dockerdbt:
? target: dev
? outputs:
? ? dev:
? ? ? type: fabric
? ? ? driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system)
? ? ? server: 'xxx' ? ? ? ? ? ? ? ? ? ? ? ? ? # Connect String which I got before
? ? ? port: 1433 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?# Default port = 1433
? ? ? database: WH ? ? ? ? ? ? ? ? ? ? ? ? ? ?# Warehouse Name
? ? ? schema: dbo ? ? ? ? ? ? ? ? ? ? ? ? ? ? # Schema Name
? ? ? authentication: ActiveDirectoryPassword # Authen by AD User & Pass
? ? ? user: xxx ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # Username
? ? ? password: xxx ? ? ? ? ? ? ? ? ? ? ? ? ? # Password        


No alt text provided for this image

5. Finally, I should run some codes to debug and build my Project. By that way, I can confirm my dbt project worked.

No alt text provided for this image

Push Code to GitHub

I used GitHub to store my project and Azure Studio Code made it supper easy.

Build Data Factory - Pipeline - Notebook

Moving to Microsoft Fabric, I created a new NoteBook to pull code from GitHub and run the dbt project.

Explanation:

Step 1: Install dbt-fabric in Notebook.

Step 2: Clone the dbt project in GitHub.

Step 3: Run the project.

Step1:
pip install -U dbt-fabric

Step2:
%%bash
git clone "https://{PrivateKey}@github.com/trantrongw/SanpleDBT2Fabric.git" "SanpleDBT2Fabric" 2> /dev/null || (cd "SanpleDBT2Fabric" ; git pull)

Step3:
%%bash
cd SanpleDBT2Fabric/Sample
dbt debug        

Create a pipeline then add the notebook for automation. (1) Add notebook to canvas then (2) Config the notebook (3) Save

No alt text provided for this image

Git Installs on Runtime 1.2

1) Create a new YAML file with the following content. Name this file as "update_env.yml"

name: Fabric1.2_GitWorkaround
channels:
? - conda-forge
? - defaults
dependencies:
? - git        

2) Create a new environment from Data Science or Data Engineering

Name the environment something like "Fabric1.2WithGit".

3) In the new?"Fabric1.2WithGit", under "public libraries", select "add from .yml"

4) Import "update_env.yml" file created in step 1.

5) Select the public library "git", click "Save" and then "Publish". Please be patient, it will take a few min to create the environment. Ensure the environment is published successfully (notification will be delivered).

6) Under the notebook, select "Fabric1.2WithGit", you will have git working


Conclusion

I am so proud of myself. This is all relatively new to me and I’m still learning about Python and Notebooks, so if you see anything where I’m not doing things the “right way” feel free to point them out

要查看或添加评论,请登录

Nghia Tran的更多文章

社区洞察

其他会员也浏览了