Getting Started with Docker: A Guide for Data Engineers

Getting Started with Docker: A Guide for Data Engineers

Welcome to the first blog in our Docker series specifically tailored for data engineers! This guide will help you get started with Docker, an essential tool for efficiently testing and deploying software.

Traditional vs. Dockerized Approach

Imagine you need to test a new version of MySQL. Traditionally, you would download the installer, follow a series of commands to set it up, configure it, and handle any compatibility issues with your existing setup. This process can be painful and time-consuming. Now, what if you could start any software with just one command? That's precisely what Docker allows you to do.

What We'll Cover

  1. What is Docker?
  2. Running a Docker Container
  3. Data Persistence with Docker Volumes
  4. Exposing a Docker Container to the Outside World

What is Docker?

Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly. With Docker, you can manage your infrastructure in the same ways you manage your applications. By taking advantage of Docker’s methodologies for shipping, testing, and deploying code quickly, you can significantly reduce the delay between writing code and running it in production.

Benefits of Docker for Developers

  • Less dependency on infrastructure for testing apps
  • Easy to deploy apps with different versions
  • Simplifies testing any software

Benefits of Docker for Organizations

  • Streamlines app production
  • Ensures self-contained apps with consistent versions
  • Scales easily
  • More cost-effective than virtual machines (VMs)

Installing Docker

Here's a step-by-step guide:

For Windows and macOS

  1. Go to Docker's official website.
  2. Download Docker Desktop for your OS.
  3. Run the installer and follow the on-screen instructions to complete the installation.

For Linux (Ubuntu example)

  1. Open your Terminal.
  2. Run the following commands:

sudo apt-get update
sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io        

Step 2: Install Docker SDK for Python

  1. Open your Terminal.
  2. Run the following command:

pip install docker        

This will install the Docker SDK for Python, which you can use to interact with Docker from within your Python scripts.

How Docker Works

Docker revolves around five main components:

  1. Docker Engine: Provides the environment for Docker to run, similar to the JVM for Java applications.
  2. Docker Image: Holds the software. For example, to run a MySQL server, download the MySQL image from Docker Hub.
  3. Docker Container: A runnable instance of a Docker image. Containers are isolated from each other and the host system, ensuring a consistent runtime environment.
  4. Docker Hub: The repository for all Docker images, similar to GitHub. It allows you to share and store images.
  5. Docker Compose: A tool for defining and running multi-container Docker applications. With Docker Compose, you can use a YAML file to configure your application’s services, networks, and volumes, making it easier to manage complex applications.

Steps to Run Docker

Let’s say you want to explore MySQL using Docker:

Step 1: Pull a Docker Image

docker pull mysql        

Step 2: Verify the Image

Docker checks for a local copy of the image. If not available or outdated, it pulls the latest version from Docker Hub.

Step 3: Run a Docker Container

docker container run --name c1 -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql        

This command runs a MySQL container named c1 with a specified root password.

Step 4: Interact Inside the Container

docker container exec -it c1 bash        

This command allows you to interact with the container. To use MySQL:

mysql -u root -p        

Exit the container with:

exit        

Difference Between Container and Image

  • Image: Read-only template with instructions to create a Docker container.
  • Container: A runnable instance of an image. Containers are not persistent; data will be lost if the container is stopped or removed.

Data Persistence with Docker Volume

To avoid data loss, mount a volume to store data persistently. For example:

docker container run --name c2 --mount source=mysqldata,target=/var/lib/mysql/ -e MYSQL_ROOT_PASSWORD=mysecretpw -d mysql        

This command maps data changes inside the container to a physical location on your machine.

Exposing a Docker Container to the Outside World

To connect to MySQL from outside the container, forward the container's port to a port on your host machine:

docker container run --name c3 -p 33360:3306 --mount source=mysqldata,target=/var/lib/mysql/ -e MYSQL_ROOT_PASSWORD=mysecretpw -d mysql        

Now, you can access MySQL on your local host at port 33360.

Example: Querying MySQL with Apache Spark

jdbcDF = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:33360").option("user", "root").option("password", "mysecretpw").option("dbtable", "mysql.user").load()        

Using Docker Compose

For multi-container applications, Docker Compose simplifies the process. Here’s a basic docker-compose.yml example to run MySQL and Adminer (a database management tool) together:

version: '3.1'

services:

  db:
    image: mysql
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: mysecretpw
    volumes:
      - mysqldata:/var/lib/mysql

  adminer:
    image: adminer
    restart: always
    ports:
      - 8080:8080

volumes:
  mysqldata:        

Run the application with:

docker-compose up        

This configuration sets up a MySQL database and an Adminer interface accessible at https://localhost:8080 .

Conclusion

With Docker, you can run applications in isolated environments, ensure data persistence using volumes, and expose applications to the outside world using port forwarding. Docker Compose further simplifies managing multi-container applications. This approach streamlines testing, deployment, and scaling of applications.

Stay tuned for the next blog where we will discuss building a custom Docker image from a Dockerfile!

Happy Dockering! ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了