Oozie

Oozie

Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

Oozie is a workflow scheduler system that manages Apache Hadoop jobs.

Oozie’s system operates by running the workflows of dependent jobs and permits users to create Directed Acyclic Graphs of workflows. These DAG’s can be run in parallel and sequentially in Hadoop.

This workflow scheduler system consists of two parts:

  • Workflow engine: Responsibility of a workflow engine is to store and run workflows composed of Hadoop jobs. This includes, MapReduce, Pig and Hive.
  • Coordinator engine: It runs workflow jobs based on predefined schedules and availability of data.

Oozie operates by running as a service in a Hadoop cluster with clients submitting workflow definitions for immediate or delayed processing.?

Oozie workflow consists of action nodes and control-flow nodes.

An action node is a workflow task, which could be moving files into HDFS. While, a control-flow node controls the workflow execution between actions by allowing constructs like conditional logic, where it allows for more actions to follow depending on the result of earlier action nodes.

Oozie can be integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box.

Features of Oozie?include:

  • Having a client API and command line interface which can be used to launch, control and monitor jobs from Java applications.
  • Using its Web Service APIs one can control jobs from anywhere.
  • Having provisions to execute jobs which are scheduled to run periodically.
  • Having provision to send email notifications upon completion of jobs.

要查看或添加评论,请登录

Dipti Goyal的更多文章

  • Risk Weighted Assets

    Risk Weighted Assets

    RWA can refer to risk-weighted assets or resident welfare association. Risk-weighted assets RWA is a banking term that…

  • Chargeback Analysis

    Chargeback Analysis

    Chargeback analysis is the process of examining data related to customer disputes on credit card transactions…

  • Solution Architecture

    Solution Architecture

    Solution architecture is a systematic method for designing IT solutions that meet business needs. It involves planning…

  • DAX

    DAX

    Data Analysis Expressions (DAX) is a formula expression language used in Analysis Services, Power BI, and Power Pivot…

  • Fraud Monitoring

    Fraud Monitoring

    Fraud monitoring is a system that continuously analyzes user activity and transactions in real-time to identify and…

  • Econometrics

    Econometrics

    Econometrics is the use of statistical and mathematical models to develop theories or test existing hypotheses in…

  • Data Manipulation

    Data Manipulation

    Data manipulation is the process of changing or organizing data to make it easier to read, analyze, and present. It's a…

  • Data Modeling

    Data Modeling

    Data modeling is the process of creating a visual representation of how data is organized and stored in a system. It…

  • TextBlob

    TextBlob

    TextBlob is a free, open-source Python library that helps process textual data. It can perform natural language…

  • Data Visualization

    Data Visualization

    Data visualization is the graphical representation of information and data. By using visual elements like charts…

社区洞察

其他会员也浏览了