Apache Airflow - if you are bored of Oozie & style

Apache Airflow - if you are bored of Oozie & style

Apache Airflow is an incubator Apache project for Workflow or Job Scheduler.

DAG is the backbone of airflow. Since DAG is not cyclic, so you can never reach the same vertex that avoids an infinite loop. 

In workflow context, tasks can be defined as vertex and the sequence is represented with the directed edge. The sequence decides the order in which the tasks will be performed

Airflow Python script is really just a configuration file specifying the DAG’s structure as code.

The actual tasks defined here will run in a different context from the context of this script. Different tasks run on different workers at different points in time,which means that this script cannot be used to cross communicate between tasks


Benefits -

  • Airflow has a very powerful UI. Loads of control are given over airflow.
  • Airflow workflow is written on Python, so developer friendly and if you dont like config style workflow, then airflow is the saviour.
  • Extremely easy to create new workflow based on DAG
  • Centralized logging
  • Great automation, scheduling options.
  • State captures
  • Auto Retry failed tasks, depends on configuration


I was using Oozie and now I needed something new and more easier. Apache Airflow seems to be very promising.

Danielle Felder

Marketing Communications Manager at Perforce Software

6 年

Really interesting. Would love if you added your review of Airflow to IT Central Station as well. Users interested in solutions like Airflow and Oozie also read reviews for Automic Workload Automation. This user, who notes that he switched to Automic from other open source solutions, writes, "We have a lot of jobs that have to run, and it's easy to see what the status is." You can read the rest of his review here: https://www.itcentralstation.com/product_reviews/automic-workload-automation-review-47481-by-jared-kessans/tzd/c366-sbc-185.

回复

要查看或添加评论,请登录

Abhishek Choudhary的更多文章

  • Slack New Architecture

    Slack New Architecture

    This article presented the architecture/engineering decisions and changes brought in Slack to Scale it massively but by…

  • Unit Testing Apache Spark Applications in Scala or Python

    Unit Testing Apache Spark Applications in Scala or Python

    I saw a trend that developers usually find it very complicated to test spark application, may be no good library…

  • Spark On YARN cluster, Some Observations

    Spark On YARN cluster, Some Observations

    1. Number of partitions in Spark Basic => n Number of cores = n partitions = Number of executors Good => 2-3 times of…

    4 条评论
  • Apache Spark (Big Data) Cache - Something Nice to Know

    Apache Spark (Big Data) Cache - Something Nice to Know

    Spark Caching is one of the most important aspect of in-memory computing technology. Spark RDD Caching is required when…

  • Apache Spark Serialization issue

    Apache Spark Serialization issue

    Its bit common to face Spark Serialization Issue while working with Streaming or basic Spark Job org.apache.

    3 条评论
  • Few points On Apache Spark 2.0 Streaming Over cluster

    Few points On Apache Spark 2.0 Streaming Over cluster

    Experience on Apache Spark 2.0 Streaming Over cluster Apache Spark streaming documentation has enough details about its…

  • Facebook Architecture (Technical)

    Facebook Architecture (Technical)

    Facebook's current architecture is: Web front-end written in PHP. Facebook's HipHop Compiler [1] then converts it to…

  • Apache Flink ,From a Developer point of View

    Apache Flink ,From a Developer point of View

    What is Apache Flink ? Apache Flink is an open source platform for distributed stream and batch data processing Flink’s…

    2 条评论
  • Apache Spark (big Data) DataFrame - Things to know

    Apache Spark (big Data) DataFrame - Things to know

    What is the architecture of Apache Spark Now? What is the point of interaction in Spark? Previously it was RDD but…

    6 条评论
  • Apache Spark 1.5 Released ...

    Apache Spark 1.5 Released ...

    Apache Spark 1.5 is released and now available to download https://spark.

社区洞察

其他会员也浏览了