登录查看更多内容

DBT in Microsoft Fabric: MY JOURNAL

Nghia Tran

Solutions Consultant at CMC

发布日期: 2023年6月27日

+ 关注

Update 20231114: Git Installs on Runtime 1.2 (link) - thanks venkims fromMS.

Update 20231108: Change runtime version in Spark Setting from 1.2 to 1.1 (Runtime version 1.2 does not have git).

Update 20230930: Remove docker and azure parts.

Init 20230627: Publication date.

Hi, my name is Nghia. This is my journal on how I applied Microsoft Fabric combined with dbt to a project in preparation for new customers in the future.

As I work with data, I always focus on the pattern: What - Why - How.

What

Microsoft Fabric is a new unified data platform by Microsoft that provides everything we need to build a Data System. We can build a Data Warehouse inside Fabric.

Dbt is a data transformation workflow that helps you get more work done while producing higher-quality results.

I want to combine both of them together, as I believe it will have a positive impact.

Why

First, when I recently did a workshop about Fabric, it wasn't very practical. I tried live coding and explaining, but it didn't work well. Not many people understood what I was saying. Maybe my explanation was not clear enough. I am down-to-earth, so I need down-to-earth solutions: writing it down.

Second, I want to share my knowledge with people around the world. Even if nobody reads this article, I write it for myself.

HOW

In this part, I will explain how I combine Fabric and dbt using my code and experience. In building a Data Warehouse, there is always a lot of SQL code to manage. Dbt has become a great tool for managing SQL pipelines worldwide. Unfortunately, dbt is not fully compatible with Fabric at the moment. The main reason is that Fabric is new and in preview mode. Sometimes I face challenges because Fabric lacks some features in Data Factory Service, Notebook, and Data Warehouse. Additionally, dbt Cloud does not support Fabric, so I have to build many things myself.

This article shares my experience of using dbt step by step in Microsoft Fabric. Before we continue, please be aware that this article assumes the reader knows:

How to use dbt and already has their dbt project set up and ready to use.
You are trying to use Microsoft Fabric and are familiar with it.

Overview

There are some key components in this solution:

Dbt Project: Develop a Dbt project locally and build a CI/CD in Github.
Data Warehouse: It is built using the Warehouse sample.
In the Notebook, I pull code from Git Hub and run the dbt project.

Some highlights of this solution:

Microsoft Fabric: It provides all I need to build a data platform.
Cost-effective: The cost to run Microsoft Fabric and dbt-core is free.

Let's go through step by step:

Build Warehouse Sample

If you are already in the Fabric Preview because you are using Power BI like me, then it is easy to build a warehouse sample. Alternatively, you can try it through the provided link. After that, we create a new Workspace in License mode = Trial Fabric. Let's call it "DBTWS".

"This is my favorite part: I can build a Synap Data Warehouse with just one click. As a computer geek, I'm always eager to use it. Here's how you can do it too:

In the bottom left corner, switch to Data Warehouse.
Click on 'New Warehouse (preview)'.
Fill in the required 'Name*' field. I named mine 'WH'. You can refer to the image below for a visual representation of the steps.

Sure, take your time and enjoy a cup of coffee while the Fabric creates the warehouse for us. This process may take a few moments. Once the warehouse is ready, we can proceed to use the sample database. Keep an eye on the screen as it may display a message saying "Loading sample data. This may take a few minutes...". Sometimes it can take around 10 to 20 minutes to complete. If it takes longer, feel free to have another cup of coffee to keep you refreshed. Once the sample data is loaded, we'll be ready to dive into the exciting features of the Synap Data Warehouse

Now that you have the data in the date table, you can begin exploring and analyzing it as you need it.

To retrieve the connection string, I performed the following steps:

Clicked on the three dots (ellipsis) in the DW.
A new menu tooltip appeared on the screen.
Selected "Copy SQL connection string" from the menu.
A new dialog box appeared, where you clicked on the "Copy" button.
You pasted the copied connection string into a note

领英推荐

101 Guide on Apache Airflow Operators

Censius 2 年前

Databricks: A Contemporary Solution for Today’s Data…

Analytics8 | Data & Analytics Consultancy 2 年前

Ensuring Data Quality in Databricks with Great…

Machine Learning Reply GmbH 1 年前

Simple DBT project.

Before we continue, please be aware that this article assumes that the reader has already completed the following steps::

Installed Python (version 3.9 or 3.10). You can find the installation Here
Installed Git, preferably the latest version. You can find the installation Here
Installed the ODBC driver version 18 for Windows. Make sure you have the appropriate driver installed for your database. You can find the download Here
Installed dbt-fabric, which is a dbt adapter for Fabric Data Warehouse. You can find more information and installation instructions Here
Installed Visual Studio Code, a popular code editor. You can download it from the official website here.

If you have never used dbt (data build tool) before, I recommend going through my previous article where I have created a sample project. It will give you a good understanding of the basics. For this sample project, I aim to keep it even simpler.

Here are the steps to initialize the dbt project and choose the "fabric" option:

Open your terminal or command prompt and navigate to the desired directory where you want to create your dbt project. Run the following command to initialize a new dbt project: dbt init then pick the fabric adapter.
Delete 2 default files: my_second_dbt_model.sql, schema.yml.
Edit my_first_dbt_model.sql:

SELECT TOP 1 * FROM Date

4. Add a new file profiles.yml; you will enter your info by youseft. After that, you will see our dbt project like this:

dockerdbt:
? target: dev
? outputs:
? ? dev:
? ? ? type: fabric
? ? ? driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system)
? ? ? server: 'xxx' ? ? ? ? ? ? ? ? ? ? ? ? ? # Connect String which I got before
? ? ? port: 1433 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?# Default port = 1433
? ? ? database: WH ? ? ? ? ? ? ? ? ? ? ? ? ? ?# Warehouse Name
? ? ? schema: dbo ? ? ? ? ? ? ? ? ? ? ? ? ? ? # Schema Name
? ? ? authentication: ActiveDirectoryPassword # Authen by AD User & Pass
? ? ? user: xxx ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # Username
? ? ? password: xxx ? ? ? ? ? ? ? ? ? ? ? ? ? # Password

5. Finally, I should run some codes to debug and build my Project. By that way, I can confirm my dbt project worked.

Push Code to GitHub

I used GitHub to store my project and Azure Studio Code made it supper easy.

Build Data Factory - Pipeline - Notebook

Moving to Microsoft Fabric, I created a new NoteBook to pull code from GitHub and run the dbt project.

Explanation:

Step 1: Install dbt-fabric in Notebook.

Step 2: Clone the dbt project in GitHub.

Step 3: Run the project.

Step1:
pip install -U dbt-fabric

Step2:
%%bash
git clone "https://{PrivateKey}@github.com/trantrongw/SanpleDBT2Fabric.git" "SanpleDBT2Fabric" 2> /dev/null || (cd "SanpleDBT2Fabric" ; git pull)

Step3:
%%bash
cd SanpleDBT2Fabric/Sample
dbt debug

Create a pipeline then add the notebook for automation. (1) Add notebook to canvas then (2) Config the notebook (3) Save

Git Installs on Runtime 1.2

1) Create a new YAML file with the following content. Name this file as "update_env.yml"

name: Fabric1.2_GitWorkaround
channels:
? - conda-forge
? - defaults
dependencies:
? - git

2) Create a new environment from Data Science or Data Engineering

Name the environment something like "Fabric1.2WithGit".

3) In the new?"Fabric1.2WithGit", under "public libraries", select "add from .yml"

4) Import "update_env.yml" file created in step 1.

5) Select the public library "git", click "Save" and then "Publish". Please be patient, it will take a few min to create the environment. Ensure the environment is published successfully (notification will be delivered).

6) Under the notebook, select "Fabric1.2WithGit", you will have git working

Conclusion

I am so proud of myself. This is all relatively new to me and I’m still learning about Python and Notebooks, so if you see anything where I’m not doing things the “right way” feel free to point them out

查看更多评论

要查看或添加评论，请登录

Nghia Tran的更多文章

Microsoft Fabric Licensing Calculator

2023年11月29日

Microsoft Fabric Licensing Calculator

Updated image Full Link
RLS in Microsoft Fabric Dataset

2023年10月10日

RLS in Microsoft Fabric Dataset

Some limitations of Direct Lake in its current Public Preview state include the inability to create row-level security…
Real-time Analytics: Streaming SQL Server Data to Fabric using Debezium

2023年9月28日

Real-time Analytics: Streaming SQL Server Data to Fabric using Debezium

Introduction/What This article draws a method to stream data in real-time from SQL Server to Fabric using Debezium…
Power BI Custom Connector: Build a Module's Lineage View

2023年8月9日

Power BI Custom Connector: Build a Module's Lineage View

What The scenario we are dealing with is as follows: Power Query connectors do not support Lineage View in Visual…

4 条评论
Microsoft Fabric With dbt

2023年6月16日

Microsoft Fabric With dbt

UPDATE 2023-06-28: You can find more complete detailed and optimized examples for this content in the DBT in Microsoft…
Dynamic format strings in Power BI

2023年4月13日

Dynamic format strings in Power BI

V?n ?? Các báo cáo tài chính c?a Vi?t Nam th??ng có con s? r?t l?n t? ??n vài nghìn t?, do v?y h? th??ng có nhu c?u…

1 条评论
Load Azure to Snowflake by dbt

2023年4月12日

Load Azure to Snowflake by dbt

V?n ?? Sau khi d? li?u ???c trích xu?t t? các Datasource và ???c ?áp xu?ng Azure Data Lake. Ta c?n ??a d? li?u t? Lake…

1 条评论
Camera AI và Power BI ?? theo d?i k? n?ng c?a nhan viên real-time.

2023年4月4日

Camera AI và Power BI ?? theo d?i k? n?ng c?a nhan viên real-time.

M?c tiêu Data Team xay d?ng 1 ??a ?i?m ?? nhan viên có th? xem th?ng tin k? n?ng mà mình ?? h?c khi tham gia các l?p…
Auto refresh data in Power BI Desktop

2023年4月4日

Auto refresh data in Power BI Desktop

Power BI Desktop là phiên b?n Free cho phép chúng ta k?t n?i/x? ly/ v? bi?u ?? chia s? file ??n cho ng??i khác. Nh?ng…

3 条评论
Data Governance: Audit Power BI Tenant Settings

2023年4月3日

Data Governance: Audit Power BI Tenant Settings

Gi?i thi?u Bài vi?t này s? cung c?p cho b?n h??ng d?n t?ng b??c v? cách l?y + ki?m tra các c?u hình d?ch v? Power BI và…

See all articles

DBT in Microsoft Fabric: MY JOURNAL

Nghia Tran

Solutions Consultant at CMC

What

Why

HOW

Overview

Build Warehouse Sample

领英推荐

Simple DBT project.

Push Code to GitHub

Build Data Factory - Pipeline - Notebook

Git Installs on Runtime 1.2

Conclusion

Nghia Tran的更多文章

社区洞察

其他会员也浏览了

DoubleCloud’s 14th Product Update

A unified platform with Databricks & dbt

No-Code Payrun ETL for Snowflake, Databricks, Bigquery, Azure Synapse & Redshift

No-Code InterSystems Cache ETL for Snowflake, Bigquery, Azure Synapse & Redshift

Real-time Data Processing with Google Dataflow

No-Code Sage Intacct ETL for Snowflake, Bigquery, Azure Synapse & Redshift

No-Code Accelo ETL for Snowflake, Bigquery, Azure Synapse & Redshift

DataBricks: Guide for Beginners

No-Code ADP ETL for Snowflake, Bigquery, Azure Synapse & Redshift

Why do we use Apache Spark and Clickhouse as Foundational Engines at Datazone?

What

Why

HOW

Overview

Build Warehouse Sample

领英推荐

Simple DBT project.

Push Code to GitHub

Build Data Factory - Pipeline - Notebook

Git Installs on Runtime 1.2

Conclusion

Nghia Tran的更多文章

Microsoft Fabric Licensing Calculator

RLS in Microsoft Fabric Dataset

Real-time Analytics: Streaming SQL Server Data to Fabric using Debezium

Power BI Custom Connector: Build a Module's Lineage View

Microsoft Fabric With dbt

Dynamic format strings in Power BI

Load Azure to Snowflake by dbt

Camera AI và Power BI ?? theo d?i k? n?ng c?a nhan viên real-time.

Auto refresh data in Power BI Desktop

Data Governance: Audit Power BI Tenant Settings

社区洞察

其他会员也浏览了

DoubleCloud’s 14th Product Update

A unified platform with Databricks & dbt

No-Code Payrun ETL for Snowflake, Databricks, Bigquery, Azure Synapse & Redshift

No-Code InterSystems Cache ETL for Snowflake, Bigquery, Azure Synapse & Redshift

Real-time Data Processing with Google Dataflow

No-Code Sage Intacct ETL for Snowflake, Bigquery, Azure Synapse & Redshift

No-Code Accelo ETL for Snowflake, Bigquery, Azure Synapse & Redshift

DataBricks: Guide for Beginners

No-Code ADP ETL for Snowflake, Bigquery, Azure Synapse & Redshift

Why do we use Apache Spark and Clickhouse as Foundational Engines at Datazone?