How can Install Airflow With Docker in Minutes!

How can Install Airflow With Docker in Minutes!

Link repo: https://github.com/ntd284/personal_install_airflow_docker.git

?? Get Airflow Up and Running with Docker in Minutes

Step-by-step instructions that are easy to follow. You'll find it a breeze. You'll set up both the complete and the lightweight versions of Airflow.

Contents

  • Introduction
  • Installing Airflow — Full Version
  • Installing Airflow — Lite Version
  • Conclusion

Introduction

Airflow is a community-driven platform designed to programmatically create, schedule, and oversee workflows.”

As a top-tier tool for orchestrating workflows, Airflow is highly regarded. Yet, its installation can be unnecessarily complex.

This tutorial demystifies the setup with two straightforward installation approaches. By the end, you'll be ready to use Airflow in no time.


?? Airflow Installation — Full Version

In this tutorial, we'll use the Docker version for installation. I assume you already have Docker Desktop installed on your local machine. Let's verify that.

Also, check running available containers in Docker with:

It's clean currently, lastly we will check the Docker compose version. You should have the lastest version of docker-compose

Make sure you have the latest version of Docker Compose installed.

Now, we're ready to install the full version of Airflow using Docker.

Let's start with the basics. First, get docker-compose.yaml from Airflow website

The docker-compose.yaml file includes the following service definitions:

  • airflow-scheduler: Manages and schedules tasks and DAGs.
  • airflow-webserver: Hosts the web interface accessible at localhost:8080 .
  • airflow-worker: Executes tasks assigned by the scheduler.
  • airflow-init: Initializes the Airflow setup.
  • flower: Monitors and provides insights into the environment. It is available at localhost:5555 .
  • postgres: Serves as the database.
  • redis: Facilitates message forwarding from the scheduler to the workers.

Let’s look at the docker-compose.yaml file:

We just install the lastest version of Apache Airflow (2.9.2).

Besides the common environment variables for the airflow services, they have four volumes: dags, logs, config and plugins.

We also need to create four folders in local environment for Airflow volumes synchronization:

We need to be sure that permissions for volumes synchronization between the local environment and docker containers are the same with:

In Yaml file:

In local environment:

That's all for setting, let's initialize the airflow installation with Docker.

Installation finished with success and added Admin role for airflow.

We ready to start these services of airflow on Docker

We can check them on Docker desktop

Containers are up, healthy and running.

We can go to web browser and see the localhost:8080. Username and Password: airflow

Let's click Sign In and keep playing,

We've successfully installed the full version of Airflow in just a few minutes using Docker.


?? Airflow Installation — Lite Version

The full installation of Airflow uses multiple containers, consuming significant resources.

This setup is necessary for production with Kubernetes but not for local use.

To save resources, we'll modify the YAML file. First, ensure all previously running containers are stopped.

All images are deleted too:

After that, we will set up in similiar way again new folder is airflow-lite:

Let’s modify the YAML file.

1. Change CeleryExecutor to LocalExecutor:

2. Remove other Celery environment variables:

3. Remove redis service:

4. Remove redis depend on:

5. Remove the airflow-worker service:

6. Remove Airflow-trigger service:

7. Finally, remove flower service.

That's all for a lite version of the YAML file

Don't forget to define environment variables in our local environment:

We are ready to install the Airflow lite version.

First, initialize the Airflow.

Run Docker-compose up -d

Containers is running and healthy on docker desktop:

Let's check on webserver localhost:8080. After login with account and password is airflow.

We can take a example to run. We can have:

That’s all. We installed the Airflow lite version in minutes.

Based on the parameters in 2 images:

We can have comparisons between Airflow Lite and Airflow Full versions:

Airflow Lite:

  • CPU Usage: 296.70% of 400% (4 cores allocated)
  • Memory Usage: 800.45MB of 7.49GB

Airflow Full:

  • CPU Usage: 185.07% of 400% (4 cores allocated)
  • Memory Usage: 3.03GB of 7.49GB

Airflow Lite uses significantly less memory compared to the Full version, although the CPU usage is higher.

?? Conclusion:

  • Full Version for Production: Ideal for production with Kubernetes, includes all necessary services like Redis, workers, and Flower for robust task execution and monitoring.
  • Lite Version for Local Development: Uses fewer resources by excluding non-essential components. Perfect for local development and testing with simpler setup and maintenance.
  • Switching Between Setups: Flexible configuration allows easy transition from lite to full version. Start with lite for development, then move to full for production.
  • Benefits: Cost-effective development with lite version; scalable and robust production environment with full version.

??References:

[1]. Apache Airflow documentation.

[2]. You Can Install Airflow With Docker in Minutes!.








Abiodun Dare

Data Engineer @ Standard Bank Group | Big Data, Analytics

3 个月

Many thanks for sharing. ??

回复
Nguy?n C?ng Ng?

?Fullstack Web Developer | PHP | Node JS | Database

4 个月

Great!

回复
Diego Ferrarini

Data Architect | Lakehouse | Spark | Dbt | Dremio |Power BI

5 个月

I think that Airflow Lite can be used on small projects where resources are limited. Thank's for sharing.

Thanh - Duy Nguyen

?Data Engineer @ Bosch Global Software Technologies Vietnam | Data Migration, Data Warehouse, Google BigQuery

5 个月

Thanks for sharing

要查看或添加评论,请登录

社区洞察

其他会员也浏览了