Amazon Managed Workflows for Apache Airflow (MWAA)

Amazon Managed Workflows for Apache Airflow (MWAA)

Amazon MWAA is the managed version of Airflow offered by AWS. In this small document I am going to add findings from experience and chatting with AWS Support that are not adequately documented in the official documentation of airflow or even AWS. Amazon Managed Workflows for Apache Airflow is a managed orchestration service for Apache Airflow that you can use to setup and operate data pipelines in the cloud at scale. Apache Airflow is an open-source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as workflows. With Amazon MWAA, you can use Apache Airflow and Python to create workflows without having to manage the underlying infrastructure for scalability, availability, and security. Amazon MWAA automatically scales its workflow execution capacity to meet your needs, Amazon MWAA integrates with AWS security services to help provide you with fast and secure access to your data.


Airflow has a large and active open-source community that contributes new functionality and integrations regularly. Amazon MWAA supports existing Airflow workflows and integrations without changes to code, migration is easy, and the environment is familiar.

All of the components contained in the outer box (in the image below) appear as a single Amazon MWAA environment in your account. The Apache Airflow Scheduler and Workers are AWS Fargate (Fargate) containers that connect to the private subnets in the Amazon VPC for your environment. Each environment has its own Apache Airflow metadatabase managed by AWS that is accessible to the Scheduler and Workers Fargate containers via a privately-secured VPC endpoint.

Amazon CloudWatch, Amazon S3, Amazon SQS, and AWS KMS are separate from Amazon MWAA and need to be accessible from the Apache Airflow Scheduler(s) and Workers in the Fargate containers.

The Apache Airflow Web server can be accessed either over the Internet by selecting the Public network Apache Airflow access mode, or within your VPC by selecting the Private network Apache Airflow access mode. In both cases, access for your Apache Airflow users is controlled by the access control policy you define in AWS Identity and Access Management (IAM).

要查看或添加评论,请登录

Rohit Singh的更多文章

  • Trello

    Trello

    Trello is a popular, simple, and easy-to-use collaboration tool that enables you to organize projects, and everything…

  • Safe Agilist

    Safe Agilist

    The Scaled Agile Framework? (SAFe?) is a set of organizational and workflow patterns for implementing agile practices…

  • Data strategy

    Data strategy

    A data strategy is a plan that outlines how an organization collects, manages, and uses data to meet its goals. It's a…

  • STL

    STL

    Standard Template Library (STL) provides the built-in implementation of commonly used data structures known as…

  • Fraud Detection

    Fraud Detection

    Fraud detection is a set of activities undertaken to prevent money or property from being obtained through false…

  • Django

    Django

    Django, built with Python, is designed to help developers build secure, scalable, and feature-rich web applications…

  • Product Backlog

    Product Backlog

    A product backlog is a prioritized list of work for the development team that is derived from the product roadmap and…

  • Delta Lake

    Delta Lake

    A Delta Lake is an open-source storage layer designed to run on top of an existing data lake and improve its…

  • API Testing

    API Testing

    API testing is a process that involves making requests to an API endpoint and verifying the response. It's also known…

  • SAP MM

    SAP MM

    SAP MM stands for "Materials Management." SAP MM (Materials Management) is a SAP ERP Central Component (ECC) module…

社区洞察

其他会员也浏览了