HOW DOCKER WORKS
Koushik Dutta
Senior Data Engineer @ Deloitte Engineering | Distributed System | AWS Certified, Data Analytics
Now that we learned about containerization and how containers work, it’s time to face the ultimate truth. Docker is nothing but a containerization software and the Docker engine is nothing but a container engine.
A docker engine consists of Docker daemon and other utilities to create, destroy and manage containers. Docker daemon is a process running in the background that receives commands from local or remote Docker client (CLI) using HTTP REST protocol to manage containers. Hence Docker is said to follow client-server architecture where the server is Docker daemon.
When you install Docker on your system, you get the Docker engine, Docker command-line interface (Docker client) and other GUI utilities. When you start your Docker, it will start the Docker daemon.
? What is a Docker container?
The container we discussed so far is a general interpretation of what a container is and how it works. Docker container is far more sophisticated than that.
A docker container contains application code and other dependencies. These other dependencies are what make a container a “container”. These other dependencies consist of necessary (application-specific) libraries, binaries, and other resources that are needed for our application to function.
An example of a container would be a node.js server. So our application code would consist of server.js containing application code and node_module library. But to run it, we need node to be installed in the container, hence we need a node binary file. node.js might depend on other binaries and libraries, hence we need that too. Then node.js needs an OS to run on for example CentOS, hence we need a customized binary for that too which Docker engine could utilize to talk to guest OS and kernel.
? What is a Docker image?
The node server example we just talked about contains many pieces that need to be present in the container so that our application can work. A Docker image is a zipped box that contains all these pieces.
We instruct the Docker client to create a container from this image. The docker client instructs the Docker daemon to unzip the image, read the content and launch the container with server.js executing as a process. Depending on other instructions in the image, the Docker daemon might expose some ports from the container which we can listen to and/or mount volumes and do other things.
To create a Docker image, we need a Dockerfile. Dockerfile is a configuration file with instructions to tell the Docker engine, how to build an image. These instructions can be what would be the base image, what would be the working directory inside the OS running inside the container, what application-specific files need to be copied from the system, what ports need to be exposed in the container and other zillions of things.
A base image is an official image provided by Docker in which we will add our application-specific code and instructions. A base image can contain the CentOS operating system installed with the Apache server.
A docker image follows a modified Union File System such as AuFS. Each instruction in Dockerfile creates a read-only AuFS layer. These layers are stacked on each other as mentioned in Dockerfile. Each layer is only a set of differences from the layer before it.
When we create a container from this image, we copy all these read-only layers and add a new read-write layer on top of it. The read-only layers are called image layers while the thin read-write layer in the container is called a container layer.
A typical Dockerfile would look like below
FROM ubuntu:15.04
COPY . /app
RUN make /app
CMD python /app/app.py
In above Dockerfile, we are creating our image from base Ubuntu image of version 15.04 (provided by Docker Hub) which creates the first layer. Then we are copying everything from the current directory to the /app location in Ubuntu OS which creates a new layer and stacks on the previous layer. Then we are building the application using make command which writes output to the new layer and stacks on the top of the previous layer. Then we are running the python program using python command. The last instruction does not take any space in the layer as it is a bash command.
As you can see from the above image, when we run a container, it creates a read-write layer on the top of the image layers. All the changes made to the running container, such as writing new files, modifying existing files and deleting files, are written to this thin writable container layer.
When a container is running, the container layer needs to communicate with layers below it to merge the differences in each layer and generate an actual file system. This is done using storage drivers provided by the Docker engine.
When a layer (including the container layer) needs to read a file in the below layer, it reads the file from that layer directly. While building an image, when a layer needs to write a file from the layer below it, that file is copied to the current layer and changes are made there (diff is saved in the layer).
In the container, when the container layer wants to write to the file from the layer below it, that file is copied to the container layer and changes are made to the file. The strategy of copying the file when we want to write (modify it) is called a copy-on-write (CoW) strategy.
This makes the writable layer lightweight, hence we call it the thin layer. Hence all modification made to the image layers lives in the writable container layer. When the container is destroyed, the container layer is destroyed too but image layers are preserved as it is. We can still save the writable layer of a container if we want to which is called a persistent Docker container.
Multiple containers can share some or all file system layers from one or many images. Since each layer is labeled with UUID which is a checksum of the content in the layer, they are very re-usable. If two containers are made from the same image, they share 100% of the image layers and have their own unique writable layer (as seen in the image below).
Having a layered filesystem with a copy-on-write (CoW) strategy along with layer usability is what makes Docker containers so blazing fast to create. Hence containers are lightweight and have small sizes on the disk (size of the writable layer only).
How does Docker work on non-Linux platforms?
If you survived till here, then I have one question for you. If Docker is based on Linux Containers (LXC) when how docker works on other kernels like Darwin for MacOs and Windows NT for windows?
Docker originally used Linux Containers (LXC) and was designed for Linux kernel only. Hence any Linux based operating system could use it as is. For Windows and macOS, Docker would run the Docker engine inside a Linux Virtual Machine using Virtual Box. For Windows and macOS, you had to install Docker Desktop which would take care of the virtualization.
But the latest version of Docker, for Linux, uses runC (formerly known as libcontainer) which follows OCI (Open Container Initiative) specifications. runC is a CLI tool for spawning and running containers that runs in the same operating system as its host. In the case of Windows, Docker uses Hyper-V which is in-built virtualization technology provided by Windows. Docker uses Hypervisor framework in the case of MacOs for virtualization.
Data Scientist, Data Engineer. Data Architect, Certified Data Management Professional (CDMP), ISO 8000 MDQM, ITIL Certified.HL7/FHIR Standards Integrator, Passion for AI/ML in Healthcare
1 年Expecting the second part and perfect conclusion of the topic