Getting Started with Docker: A Guide for Data Engineers
Rana Sheharyar
Building Data, Analytics, and AI Engineering teams at CYBRNODE | We are hiring! ??
Welcome to the first blog in our Docker series specifically tailored for data engineers! This guide will help you get started with Docker, an essential tool for efficiently testing and deploying software.
Traditional vs. Dockerized Approach
Imagine you need to test a new version of MySQL. Traditionally, you would download the installer, follow a series of commands to set it up, configure it, and handle any compatibility issues with your existing setup. This process can be painful and time-consuming. Now, what if you could start any software with just one command? That's precisely what Docker allows you to do.
What We'll Cover
What is Docker?
Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly. With Docker, you can manage your infrastructure in the same ways you manage your applications. By taking advantage of Docker’s methodologies for shipping, testing, and deploying code quickly, you can significantly reduce the delay between writing code and running it in production.
Benefits of Docker for Developers
Benefits of Docker for Organizations
Installing Docker
Here's a step-by-step guide:
For Windows and macOS
For Linux (Ubuntu example)
sudo apt-get update
sudo apt-get install \
ca-certificates \
curl \
gnupg \
lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
Step 2: Install Docker SDK for Python
pip install docker
This will install the Docker SDK for Python, which you can use to interact with Docker from within your Python scripts.
How Docker Works
Docker revolves around five main components:
Steps to Run Docker
Let’s say you want to explore MySQL using Docker:
Step 1: Pull a Docker Image
docker pull mysql
Step 2: Verify the Image
Docker checks for a local copy of the image. If not available or outdated, it pulls the latest version from Docker Hub.
领英推荐
Step 3: Run a Docker Container
docker container run --name c1 -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql
This command runs a MySQL container named c1 with a specified root password.
Step 4: Interact Inside the Container
docker container exec -it c1 bash
This command allows you to interact with the container. To use MySQL:
mysql -u root -p
Exit the container with:
exit
Difference Between Container and Image
Data Persistence with Docker Volume
To avoid data loss, mount a volume to store data persistently. For example:
docker container run --name c2 --mount source=mysqldata,target=/var/lib/mysql/ -e MYSQL_ROOT_PASSWORD=mysecretpw -d mysql
This command maps data changes inside the container to a physical location on your machine.
Exposing a Docker Container to the Outside World
To connect to MySQL from outside the container, forward the container's port to a port on your host machine:
docker container run --name c3 -p 33360:3306 --mount source=mysqldata,target=/var/lib/mysql/ -e MYSQL_ROOT_PASSWORD=mysecretpw -d mysql
Now, you can access MySQL on your local host at port 33360.
Example: Querying MySQL with Apache Spark
jdbcDF = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:33360").option("user", "root").option("password", "mysecretpw").option("dbtable", "mysql.user").load()
Using Docker Compose
For multi-container applications, Docker Compose simplifies the process. Here’s a basic docker-compose.yml example to run MySQL and Adminer (a database management tool) together:
version: '3.1'
services:
db:
image: mysql
restart: always
environment:
MYSQL_ROOT_PASSWORD: mysecretpw
volumes:
- mysqldata:/var/lib/mysql
adminer:
image: adminer
restart: always
ports:
- 8080:8080
volumes:
mysqldata:
Run the application with:
docker-compose up
This configuration sets up a MySQL database and an Adminer interface accessible at https://localhost:8080 .
Conclusion
With Docker, you can run applications in isolated environments, ensure data persistence using volumes, and expose applications to the outside world using port forwarding. Docker Compose further simplifies managing multi-container applications. This approach streamlines testing, deployment, and scaling of applications.
Stay tuned for the next blog where we will discuss building a custom Docker image from a Dockerfile!
Happy Dockering! ??