登录查看更多内容

Full vs. Incremental Loads – Data Engineering with Fabric

John Miner

Data Architect at Insight

发布日期: 2024年5月20日

The loading of data from a source system to target system has been well documented over the years. My first introduction to an Extract, Transform and Load program was DTS for SQL Server 7.0 in 1998.

In a data lake, we have a bronze quality zone that is supposed to represent the raw data in a delta file format. This might include versions of the files for auditing. In the silver quality zone, we have a single version of truth. The data is de-duplicated and cleaned up. How can we achieve these goals using the Apache Spark engine in Microsoft Fabric?

Business Problem

Our manager has given us weather data to load into Microsoft Fabric. In the last article, we re-organized the full load sub-directory to have two complete data files (high temperatures and low temperatures) for each day of the sample week. As a result, we have 14 total files.

The incremental load sub-directory has two files per day for 1,369 days. The high and low temperature files for a given day each have a single reading. There is a total of 2,738 files. How can we create a notebook to erase existing tables if needed, rebuild the full load tables, and rebuild the incremental tables?

Technical Solution

This use case allows data engineers to learn how to transform data using both Spark SQL and Spark DataFrames. The following topics will be explored in this article (thread).

assing parameters
show existing tables
erase tables
full load pattern
testing full load tables
incremental load pattern
testing incremental load tables
future enhancements

Please read the recent article on SQL Server Central for more details.

要查看或添加评论，请登录

John Miner的更多文章

Parting is such sweet sorrow!

2025年3月27日

Parting is such sweet sorrow!

Today is the last day of the MVP Summit 2025. I want to thank Rie Merritt, Betsy Webber, and Rochelle Sonnenberg for…

2 条评论
Why use Tally Tables in the Fabric Warehouse?

2025年2月26日

Why use Tally Tables in the Fabric Warehouse?

Technical Problem Did you know that Edgar F. Codd is considered the father of the relational model that is used by most…
Streaming Data with Azure Databricks

2025年2月25日

Streaming Data with Azure Databricks

Technical Problem The core functionality of Apache Spark has support for structured streaming using either a batch or a…

1 条评论
Upcoming Fabric Webinars from Insight

2025年2月19日

Upcoming Fabric Webinars from Insight

Don't miss the opportunity to boost your data skills with Insight and Microsoft. This webinar series will help you…
How to develop solutions with Fabric Data Warehouse?

2025年2月18日

How to develop solutions with Fabric Data Warehouse?

Technology Details The SQL endpoint of the Fabric Data Warehouse allows programs to read from and write to tables. The…
Understanding file formats within the Fabric Lakehouse

2025年2月10日

Understanding file formats within the Fabric Lakehouse

I am looking forward to talking to the Cloud Data Driven user group on March 13th. You can find all the presentation…

3 条评论
Engineering a Lakehouse with Azure Databricks with Spark Dataframes

2025年2月3日

Engineering a Lakehouse with Azure Databricks with Spark Dataframes

Problem Time does surely fly. I remember when Databricks was released to general availability in Azure in March 2018.
Create an Azure Databricks SQL Warehouse

2025年1月21日

Create an Azure Databricks SQL Warehouse

Problem Many companies are leveraging data lakes to manage both structured and unstructured data. However, not all…

2 条评论
How to Load a Fabric Warehouse?

2025年1月9日

How to Load a Fabric Warehouse?

Technology The data warehouse in Microsoft Fabric was re-written to use One Lake storage. This means each and every…
My Year End Wrap Up for 2024

2024年12月26日

My Year End Wrap Up for 2024

Hi Folks, It has been a very busy year. At the start of this year I wanted to learn Fabric in depth.

1 条评论

See all articles

Full vs. Incremental Loads – Data Engineering with Fabric

John Miner

Data Architect at Insight

Business Problem

Technical Solution

John Miner的更多文章

社区洞察

其他会员也浏览了

Microsoft Data Platform News 2024 - Week 44

A Comprehensive Guide to DataTables, DataSets, and DataAdapters in C#.NET

Microsoft Data Platform News 2025 - Week 05

Microsoft Data Platform News 2024 - Week 49

Microsoft Data Platform News 2025 - Week 11

Microsoft Data Platform News 2024 - Week 33

Hands-On Guide: Reading Data from Hudi Tables Incrementally, Joining with Delta Tables using HudiStreamer and SQL-Based Transformer

Choosing the Right File Format: A Key Decision in Data Engineering ???

Microsoft Data Platform News 2024 - Week 35

Data Modeling - Part 4

Business Problem

Technical Solution

John Miner的更多文章

Parting is such sweet sorrow!

Why use Tally Tables in the Fabric Warehouse?

Streaming Data with Azure Databricks

Upcoming Fabric Webinars from Insight

How to develop solutions with Fabric Data Warehouse?

Understanding file formats within the Fabric Lakehouse

Engineering a Lakehouse with Azure Databricks with Spark Dataframes

Create an Azure Databricks SQL Warehouse

How to Load a Fabric Warehouse?

My Year End Wrap Up for 2024

社区洞察

其他会员也浏览了

Microsoft Data Platform News 2024 - Week 44

A Comprehensive Guide to DataTables, DataSets, and DataAdapters in C#.NET

Microsoft Data Platform News 2025 - Week 05

Microsoft Data Platform News 2024 - Week 49

Microsoft Data Platform News 2025 - Week 11

Microsoft Data Platform News 2024 - Week 33

Hands-On Guide: Reading Data from Hudi Tables Incrementally, Joining with Delta Tables using HudiStreamer and SQL-Based Transformer

Choosing the Right File Format: A Key Decision in Data Engineering ???

Microsoft Data Platform News 2024 - Week 35

Data Modeling - Part 4