How to Use Oozie Workflow with HDInsight

Oozie is a workflow and coordination system that manages Hadoop jobs. Oozie is integrated with the Hadoop stack, and it supports the following jobs:

  • Apache MapReduce
  • Apache Pig
  • Apache Hive
  • Apache Sqoop

There are multiple options to use Oozie workflow with HDInsight.

  1. Oozie REST API - Oozie jobs can be managed by REST API either via Oozie command (friendly interface over REST API) or via custom code
  2. Azure Data Factory (ADF) - Oozie jobs can be orchestrated via ADF as activities within ADF pipelines
  3. Enterprise Security Package (ESP) - Oozie workflow definitions are written in Hadoop Process Definition Language (hPDL). hPDL is an XML process definition language. Properties file can be developed to define the Oozie job and be submitted to run on the cluster. Configuring HDI cluster with ESP requires Azure Active Directory Domain Services (ADDS) which is not ready yet.


References:

  1. https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-use-oozie-linux-mac (Opens in new window or tab)
  2. https://docs.microsoft.com/en-us/azure/data-factory/transform-data (Opens in new window or tab)
  3. https://docs.microsoft.com/en-us/azure/hdinsight/domain-joined/hdinsight-use-oozie-domain-joined-clusters (Opens in new window or tab)
  4. https://docs.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-introduction (Opens in new window or tab)

Ankur Puri

Certified Microsoft Azure DevOps Engineer Expert || Azure Architect || Microsoft Certified Trainer || Test Automation Consultant || Delivery Lead

1 年

Good insights! Might be helpful for others

回复
Denis Moraru

Solutions Architect ? Data Engineer ? 4x Microsoft Azure Certified

1 年

Rajeev, great, thanks for putting this together and sharing!

要查看或添加评论,请登录

RAJEEV KUMAR的更多文章

社区洞察

其他会员也浏览了