Which Airflow syntax do you prefer?
When connecting operators in an Airflow DAG, we have two broad options - use the chevron or use functions. In case of functions, we have two options - set_upstream and set_downstream.
Initially, when I wrote a DAG generator, I used the chevron notation. But, when I had to write a more complex DAG generator, I went for the function approach.
Consider the following two examples.
Using chevron
t_t01 >> t_t02 >> t_t03 >> t_t04 >> [t_t05, t_t06, t_t07, t_t08, t_t09, t_t10, t_t11] >> [t_t12, t_t13] >> [t_t14, t_t15, t_t16] >> t_end
Using functions
t_t02.set_upstream(t_t01)
t_t03.set_upstream(ue)
t_t04.set_upstream(t_t03)
t_t05.set_upstream(t_t04)
t_t06.set_upstream(t_t04)
t_t07.set_upstream(t_t04)
t_t08.set_upstream(t_t04)
t_t09.set_upstream(t_t04)
t_t10.set_upstream(t_t04)
t_t11.set_upstream(t_t04)
t_t12.set_upstream(t_t05)
t_t12.set_upstream(t_t06)
t_t12.set_upstream(t_t07)
t_t12.set_upstream(t_t08)
t_t12.set_upstream(t_t09)
t_t12.set_upstream(t_t10)
t_t12.set_upstream(t_t11)
t_t13.set_upstream(t_t05)
t_t13.set_upstream(t_t06)
t_t13.set_upstream(t_t07)
t_t13.set_upstream(t_t08)
t_t13.set_upstream(t_t09)
t_t13.set_upstream(t_t10)
t_t13.set_upstream(t_t11)
t_t14.set_upstream(t_t12)
t_t14.set_upstream(t_t13)
t_t15.set_upstream(t_t12)
t_t15.set_upstream(t_t13)
t_t16.set_upstream(t_t12)
t_t16.set_upstream(t_t13)
t_end.set_upstream(t_t14)
t_end.set_upstream(t_t15)
t_end.set_upstream(t_t16)
The functions method is verbose, but I prefer it.
One of the reasons I prefer the functions approach is that I was able to write an application to read a DAG and depict the relationships using a Sankey chart or using pyvis. The function approach provided clear markers for this activity.
I know you will say 'what a stupid idea'. Airflow renders the DAG on its canvas. Correct. Now imagine you have to share the output with someone. In case of Airflow, we have to keep grabbing images . . .
#airflow #sankeychart