tearing apart Docker...
SAKSHAM TRIVEDI
Security Engineer || Microsoft Certified Security Operation Canter Analyst
In this article I am going to talk about what is Docker? Why It is so popular and Details about under the hood working of Docker.
So, what is Docker?
According to the official documentation,
" Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly."
The Docker Platform...
Basically, Docker provides the ability to package and run an application in a loosely isolated environment called a container.
Here each container is fully isolated, this isolation and security allows you to run many containers simultaneously on a given host.
But... what are containers???
So, Containers can be understood as standardized, lightweight, and standalone executable packages that utilizes application-level virtualization & packs the application source code with all the required components such as the code, runtime environment, settings, system tools, libraries, and dependencies. to run the application program in some environment. Since they act as an isolated single unit executable component running on top of the any platform, and behaves like an actual container during runtime they coined as a container.
Containerization...
In computing science, a containerization is defined as a packaging of application program and all its dependencies and running them in a isolated manner using Application layer virtualization.
This imparts several benefits, like speedy execution, isolation, application security, enhanced reliability and platform independency.?"Docker container images, for example, become containers as soon as they start running on the Docker Engine. Irrespective of the infrastructure, containerized software applications run the same. Containers isolate applications from the environment for ensuring that they execute uniformly."
Application Layer Virtualization V/s Virtual Machine
In a nutshell, Virtualization abstracts out the physical hardware such as processor, memory, storage, and networking and converts all the physical resources into the logical resources using the specially designed software known as hypervisor runs on the top of this physical infrastructure and provides a virtual environment to run deploy and run multiple virtual instances also known as virtual machines (VMs) on top of it where each virtual instance contains a guest operating system, applications, and various dependencies as per requirement. In virtualization, depending on the hypervisor there might be additional layer of host operating system between physical hardware and Hypervisor on which VMs runs (Type - 2) or it can be solely installed on top of hardware directly (Type - 1). You can learn more about hypervisor here.
While on the other hand In Application-level virtualization or also known as operating system virtualization, applications are packed in isolated space using a common operating system known as container. basically, it virtualizes operating system so that multiple work load can run on top of single operating system platform by sharing resources directly from the operating system (or Kernel) with the containers. Here, container runs in isolated environment (Unless exposed) that's why container takes very less space leather than VM where Guest OS is also required. Typically, where It varies from few megabytes up to Gigabytes due to its reduced size and shared resources architecture multiple containers can be easily run on a single platform Which also reduces the management as container shares a common operating system does only requires a single operating system to be managed.
Although, you may have seen some other scenarios like below where Containers are deployed on top of the VMs
Since, now we know that what is docker and how it works so now, let's move ahead with why docker is speedy than traditional virtualization...?
Disclaimer: More In-Depth knowledge ahead that requires readers to have in-depth concepts of Linux
Now we're familiar with the underlying technologies, Docker will seem less mystical.?So, in order to understand the underlying working let's take a step back.
As we know that an operating system is a software that runs on hardware. I would like to call it as a "Supervisior", which I think is a fantastic analogy of what the operating system does.
"Think of the operating system as a referee or administrator, watching over the other programs. The operating system lets other programs run while coordinating the scheduling and execution of those programs."
Before moving further, Let's talk about virtualizing a processor. A processor run machine languages (compiled assembly code) when running a program. Assembly and machine languages need to conform to the processor's instruction set, so an Intel processor will run x86 machine language while your smartphone runs an ARM machine language.
Consider the exaple of running x86 program on an ARM processor.
Here is a simplified example:
The x86 instruction,?'sti', or set interrupt flag (in hex,?'fb'), is not understood by the ARM processor. This would normally result in a crash.
领英推荐
However, the processor can trap particular errors, meaning if you see something like this, go shuffle it off somewhere. In this case, it would trap the error and send the instruction to the Virtual Machine Manager.
Now the Virtual Machine Manager knows how to emulate the?'sti'?instruction, and the it simulates the instruction for error free running of VM.
But running that much process requires time and adds up the latency for overall response of the application, and on top of that not just application instructions are simulated; instructions for OS it self sums up the latency thus resulting in slow and decreased performance. while Dokcer resolves the problem... how let's dive into it.
Tearing apart Docker...
If you have used docker before then you might be familiar with the docker command. If not then aslo no need to worry, as The Docker uses `docker` as client, which most of the docker operations requires as a command commands.
A Docker image is an artifact created by running the?`docker build`?command on a given Dockerfile, which is the script used to create a docker image. Artifacts, or Docker images, can be stored on private or public repositories called registries. Some common registries are Docker Hub, quay.io, and AWS ECR.
A Docker container is the execution environment, or run-time instance of a Dockerfile; Here Dockerfile is the code file that defines how the docker image will look like and what it will contain and what commands it will execute, while A Docker Image is a docker environment just like a CD, and A Docker container is the same execution environment or instance of a Dockerfile on run-time.
cgroups & namespaces
The backbone of the Docker technology are?cgroups?(short for control groups) and kernel namespaces, both are already provided in the Linux kernel.
Using cgroups, the Linux operating system can easily manage and monitor resource allocation for a given process and set resource limits, like CPU, memory, and network limits. The kernel can now control the maximum amount of resources a process gets. This lets the Docker engine only give out 50% of the computer's memory, processors, or network, for example, to a running Docker container.
Namespaces are helpful in isolating process groups from each other. There are six default namespaces in Linux:?mnt,?IPC,?net,?usr,?pid, and?uts. Each container will have its own namespace and processes running inside that namespace, and will not have access to anything outside its namespace.
Docker containers also have network isolation (via?libnetwork), allowing for separate virtual interfaces and IP addressing between containers.
Union File System
Docker uses the union file system to create and layer Docker images. This means all images are built on top of a base image, actions are then added to that base image. For example,?RUN apt install curl?creates a new image. Under the hood, it allows files and directories of separate file systems, known as branches, to be transparently overlaid, forming a single coherent file system. When branch contents have the same directory, the contents are merged and seen together the directory.
When building an image, Docker will save these branches in its cache. A benefit is, if Docker detects the command to create a branch or layer already existing in the cache, it will re-use the branch, not the command, which is a cache hit. This is known as docker layer caching.
libcontainer / runC
The C library that provides the underlying Docker functionality is called?libcontainer, or now known as?runC. For each execution, each container gets its own root file system,?/tmp/docker/[uuid]/, and the underlying host does not let the process leave that directory, also called a?chroot jail. One of the process variables that happens when starting the operating system is the current working directory. A?chroot jail?is where the operating system prevents that started process from accessing a parent directory, such as?../).
The Future: Unikernels
The idea behind?unikernels?is to strip out much of the kernel and unnecessary software (since an operating system is meant to be generic and do many different things) to be able to just run the application with the minimum number of kernel modules and drivers for the application to work. If the application doesn't need network, then no network drivers or kernel code is included in the kernel.
One of Docker’s incredible advantages is to provide an easy-to-use abstraction of virtualization without a performance hit (since the virtualization technology isn’t really virtualization). Docker now seems to be trying to take the same developer-friendly tools to make applications developed in Docker run even faster than it could on its native operating system.
So that's all from my side for now,
Do share the article if you found it helpful.
Thanks.
Saksham Trivedi
- SK -