Apache Airflow - if you are bored of Oozie & style
Abhishek Choudhary
Data Infrastructure Engineering in RWE/RWD | Healthtech DhanvantriAI
Apache Airflow is an incubator Apache project for Workflow or Job Scheduler.
DAG is the backbone of airflow. Since DAG is not cyclic, so you can never reach the same vertex that avoids an infinite loop.
In workflow context, tasks can be defined as vertex and the sequence is represented with the directed edge. The sequence decides the order in which the tasks will be performed
Airflow Python script is really just a configuration file specifying the DAG’s structure as code.
The actual tasks defined here will run in a different context from the context of this script. Different tasks run on different workers at different points in time,which means that this script cannot be used to cross communicate between tasks
Benefits -
- Airflow has a very powerful UI. Loads of control are given over airflow.
- Airflow workflow is written on Python, so developer friendly and if you dont like config style workflow, then airflow is the saviour.
- Extremely easy to create new workflow based on DAG
- Centralized logging
- Great automation, scheduling options.
- State captures
- Auto Retry failed tasks, depends on configuration
I was using Oozie and now I needed something new and more easier. Apache Airflow seems to be very promising.
Marketing Communications Manager at Perforce Software
6 年Really interesting. Would love if you added your review of Airflow to IT Central Station as well. Users interested in solutions like Airflow and Oozie also read reviews for Automic Workload Automation. This user, who notes that he switched to Automic from other open source solutions, writes, "We have a lot of jobs that have to run, and it's easy to see what the status is." You can read the rest of his review here: https://www.itcentralstation.com/product_reviews/automic-workload-automation-review-47481-by-jared-kessans/tzd/c366-sbc-185.