登录查看更多内容

Which Airflow syntax do you prefer?

Bipin Patwardhan

Solution Architect, Solution Creator, Cloud, Big Data, TOGAF 9

发布日期: 2023年9月2日

When connecting operators in an Airflow DAG, we have two broad options - use the chevron or use functions. In case of functions, we have two options - set_upstream and set_downstream.

Initially, when I wrote a DAG generator, I used the chevron notation. But, when I had to write a more complex DAG generator, I went for the function approach.

Consider the following two examples.

Using chevron

t_t01 >> t_t02 >> t_t03 >> t_t04 >> [t_t05, t_t06, t_t07, t_t08, t_t09, t_t10, t_t11] >> [t_t12, t_t13] >> [t_t14, t_t15, t_t16] >> t_end

Using functions

t_t02.set_upstream(t_t01)

t_t03.set_upstream(ue)

t_t04.set_upstream(t_t03)

t_t05.set_upstream(t_t04)
t_t06.set_upstream(t_t04)
t_t07.set_upstream(t_t04)
t_t08.set_upstream(t_t04)
t_t09.set_upstream(t_t04)
t_t10.set_upstream(t_t04)
t_t11.set_upstream(t_t04)

t_t12.set_upstream(t_t05)
t_t12.set_upstream(t_t06)
t_t12.set_upstream(t_t07)
t_t12.set_upstream(t_t08)
t_t12.set_upstream(t_t09)
t_t12.set_upstream(t_t10)
t_t12.set_upstream(t_t11)
t_t13.set_upstream(t_t05)
t_t13.set_upstream(t_t06)
t_t13.set_upstream(t_t07)
t_t13.set_upstream(t_t08)
t_t13.set_upstream(t_t09)
t_t13.set_upstream(t_t10)
t_t13.set_upstream(t_t11)

t_t14.set_upstream(t_t12)
t_t14.set_upstream(t_t13)
t_t15.set_upstream(t_t12)
t_t15.set_upstream(t_t13)
t_t16.set_upstream(t_t12)
t_t16.set_upstream(t_t13)

t_end.set_upstream(t_t14)
t_end.set_upstream(t_t15)
t_end.set_upstream(t_t16)

The functions method is verbose, but I prefer it.

One of the reasons I prefer the functions approach is that I was able to write an application to read a DAG and depict the relationships using a Sankey chart or using pyvis. The function approach provided clear markers for this activity.

I know you will say 'what a stupid idea'. Airflow renders the DAG on its canvas. Correct. Now imagine you have to share the output with someone. In case of Airflow, we have to keep grabbing images . . .

#airflow #sankeychart

要查看或添加评论，请登录

Bipin Patwardhan的更多文章

Writing code to generate code - Python + SQL version

2025年3月6日

Writing code to generate code - Python + SQL version

In my current project, we had to build multiple metric tables. The base table had 50 columns and we had to add around…
Change management is crucial (Databricks version)

2025年2月22日

Change management is crucial (Databricks version)

My last project was a data platform implemented using Databricks. As is standard in a data project, we were ingesting…
Friday fun - Impersonation (in a good way)

2025年2月14日

Friday fun - Impersonation (in a good way)

All of us know that impersonation - the assumption of another person's identity, be it for good or bad - is not a good…
Any design is a trade-off

2025年2月3日

Any design is a trade-off

Irrespective of any area in the world (software or otherwise), every design is a trade off. A design cannot be the 'one…

1 条评论
Quick Tip: The headache caused by import statements in Python

2025年1月22日

Quick Tip: The headache caused by import statements in Python

When developing applications, there has to be a method to the madness. Just because a programming environment allows…
Databricks: Enabling safety in utility jobs

2025年1月13日

Databricks: Enabling safety in utility jobs

I am working on a project where we are using Databricks on the WAS platform. It is a standard data engineering project…
A Simple Code Generator Using a Cool Python Feature

2025年1月2日

A Simple Code Generator Using a Cool Python Feature

For a project that I executed about three years ago, I wrote a couple of code generators - three variants of a…
Recap of my articles from 2024

2024年12月17日

Recap of my articles from 2024

As we are nearing the end of 2024, I take this opportunity to post a recap of the year - in terms of the articles I…
Handling dates

2024年12月9日

Handling dates

Handling dates is tough in real life. Date handling is probably tougher in the data engineering world.
pfff -- why are you spending time to save 16sec execution time

2024年12月3日

pfff -- why are you spending time to save 16sec execution time

In my current project, we are implementing a data processing and reporting application using Databricks. All the code…

2 条评论

See all articles

Which Airflow syntax do you prefer?

Bipin Patwardhan

Solution Architect, Solution Creator, Cloud, Big Data, TOGAF 9

Bipin Patwardhan的更多文章

社区洞察

其他会员也浏览了

The Rules for Nesting Objects in Swift

Map Reduce 大数据处理系统

Shared examples with Minitest

Come Say Hi

2419. Longest Subarray With Maximum Bitwise AND

Quick demo to function calling with Mistral-7B-Instruct-v0.3

Exploring the realistic scope of 3-Parameter Weibull Analysis

Two dimensional arrays:

Print Leaf Nodes a BST Using The PreOrder Traversal : Algorithm based one!!

ENVI 5.6.2 and IDL 8.8.2 is now available!

Bipin Patwardhan的更多文章

Writing code to generate code - Python + SQL version

Change management is crucial (Databricks version)

Friday fun - Impersonation (in a good way)

Any design is a trade-off

Quick Tip: The headache caused by import statements in Python

Databricks: Enabling safety in utility jobs

A Simple Code Generator Using a Cool Python Feature

Recap of my articles from 2024

Handling dates

pfff -- why are you spending time to save 16sec execution time

社区洞察

其他会员也浏览了

The Rules for Nesting Objects in Swift

Map Reduce 大数据处理系统

Shared examples with Minitest

Come Say Hi

2419. Longest Subarray With Maximum Bitwise AND

Quick demo to function calling with Mistral-7B-Instruct-v0.3

Exploring the realistic scope of 3-Parameter Weibull Analysis

Two dimensional arrays:

Print Leaf Nodes a BST Using The PreOrder Traversal : Algorithm based one!!

ENVI 5.6.2 and IDL 8.8.2 is now available!