CONTAINER IMAGES - DEEP DIVE

CONTAINER IMAGES - DEEP DIVE

What is a Container Image: Container image contains your packaged application along with its dependencies and information on what processes it would run when it's launched. Container images can be created by providing a set of instructions inside a Dockerfile.

[root@homelab ~]$ cat Dockerfile 

FROM httpd:latest

WORKDIR /usr/local/apache2/htdocs

COPY webapp/ .

EXPOSE 8080

Each instruction in this file will add an additional "layer" to the container image. Each layer will only add the difference from the layer that was below it and then, all these layers are stacked together to form a read-only container image.

How does that work?

Don't worry, I got you covered! You need to know a few things about this and in this order. ;)

  1. Union file systems
  2. Copy-on-Write
  3. Overlay File Systems
  4. Snapshotters

Union File Systems (Aufs):

Wikipedia defines it as, "It allows files and directories of separate file systems, known as branches, to be transparently overlaid, forming a single coherent file system. Contents of directories which have the same path within the merged branches will be seen together in a single merged directory, within the new, virtual filesystem."

The idea here is that if you have multiple images with some identical data, instead of having this data copied over again, we would share it by using something called a layer.

No alt text provided for this image

Each layer is a file system and can be shared across multiple containers. Eg. The base layer - httpd is the official image of Apache and can be used across any number of containers. Since we are using the same base layer for all our containers, imagine the disk space we just saved.

These image layers are always read-only but when we create a new container from this image, we add a thin writable layer on top of it. This writable layer is where you would then create/modify/delete or make other changes required for each container.

Copy-on-Write:

When we start a container, it appears as if the container has an entire file system of its own, that would mean that every container you run in the system would need its own copy of the file system. Wouldn't this take up a lot of disk space and also take a lot of time for the containers to boot? No - Because we every container does not need its own copy of the filesystem!

No alt text provided for this image

We use a copy-on-write mechanism to achieve this. Instead of copying, the copy-on-write strategy is sharing the same instance of data to multiple processes that need access to it, and copy only when a process would need to modify or write data to this process. All other processes would continue to use the original data.

Docker makes use of the copy-on-write mechanism with both images and containers. To do this, changes between the image and the running container are tracked using a graph driver in older versions and now snapshotter.

Before any write operation is performed in the running container, a copy of the file that would be modified is placed on the writeable layer of the container where the write takes place. Now you know why its called "copy-on-write".

This strategy optimizes both image disk space usage and the performance of container start times and works in conjunction with the Union File System.

Overlay File System:

An overlay sits on top of an existing filesystem, and combines an upper and a lower directory tree and presents them as a single directory. These directories are called layers. The lower layer remains unmodified. Each layer will only add the difference from the layer that was below it and this unification process is referred to as a "union mount".

As you can see from the image below, the lower directory or the Image layer is called "lowerdir" and the upper directory or the container layer is called "upperdir". The final overlayed or unified layer is called "merged"

No alt text provided for this image

Docker, by default, uses the overlay2 filesystem ( OverlayFS ) for this. Overlay2 driver requires Linux kernel above 4.0 and solves the problem of inode exhaustion (https://github.com/moby/moby/pull/22126) in overlay driver.

[nivedv@homelab ~]$ docker container run -d -p 80:80 httpd


[nivedv@homelab ~]$ sudo mount | grep overlay2 
overlay on /var/lib/docker/overlay2/a3a07027e3db46f05e3622501ace9627de967953e9737e2e12b19a649ed06df0/merged type overlay (rw,relatime,seclabel ......


[nivedv@homelab ~]$ sudo ls -l /var/lib/docker/overlay2/a3a07027e3db46f05e3622501ace9627de967953e9737e2e12b19a649ed06df0
total 28
drwxr-xr-x. 3 root root 4096 May 23 09:40 diff
-rw-r--r--. 1 root root   26 May 23 09:40 link
-rw-r--r--. 1 root root  173 May 23 09:40 lower
drwxr-xr-x. 1 root root 4096 May 23 09:40 merged
drwx------. 3 root root 4096 May 23 09:40 work

So with overlay2 drivers, the layer structure is slightly different. Now, you have

  1. Base Layer: This is the location where the base files of your filesystem are located. In terms of container images, this layer would be your base image.
  2. Overlay Layer: This layer is often called the "container layer" as all the changes that are made to a running container, as adding, deleting, or modifying files are written to this writable layer. All changes that are made to this layer are stored in the next layer and it would be a "union" view of the Base and Diff layer.
  3. Diff Layer: All changes made in the Overlay layer are stored in this layer. If you write something that's already there in the Base Layer, then the overlay file system will copy the file to the Diff Layer and make the modifications you tried to write. This is called a copy-on-write.

SnapShotters:

Containers have the ability to build, manage, and distribute changes as a part of their container filesystem by the use of layers and graph driver. But working with graph drivers is really complicated and is error-prone. SnapShotters are different from graph drivers, as they have no knowledge of images or containers.

SnapShotters work very similar to git. Like the concept of having trees and every commit can be used to track changes that were made to these trees. A Snapshot represents a filesystem state. Snapshots have parent-child relationships using a set of directories. A diff can be taken between a parent and its snapshot to create a layer.

The SnapShotter provides an API for allocating, snapshotting, and mounting abstract, layered file systems.

Want more info?

Image Credits:

CoW - Julia Evans 9 @b0rk ( https://twitter.com/b0rk )

OverlayFS - Docker docs team ( https://docs.docker.com/storage/storagedriver/overlayfs-driver/ )


Anand T N

Infrastructure Consultant

4 年

Thanks for your article Helps a lot to learn

Siddharth Barhate

Cloud Support Engineer I at AWS

4 年

Extremely informative. Thanks for the post

Sruthi Chiramel

Senior Product Manager @ SAP

4 年

Well written . Very lucid!

Vivek Nidhi

Senior DevSecOps ? Platform Engineer ?? Cyber Security Analyst ??

4 年

Awesome mate!! this is well-written.

Velayudhan Chirangarail

Algorithm Strategy Developer for Derivative trading at Saptharishi Algo.In

4 年

Good article. It enlighten me more in to containers.

要查看或添加评论,请登录

Nived V.的更多文章

  • Plan your Migration from CentOS to RHEL

    Plan your Migration from CentOS to RHEL

    This article is intended to provide a framework which can be leveraged by your organisations to create your own…

    1 条评论
  • Kubernetes Security - Part I

    Kubernetes Security - Part I

    Complexity is the worst enemy of Security - Bruce Schneier Kubernetes is designed to be highly portable, with multiple…

    9 条评论
  • Fundamentals of Kubernetes Networking

    Fundamentals of Kubernetes Networking

    Understanding the Kubernetes Networking Model The Kubernetes Network Model specifies: Every Pod gets its own IP…

    33 条评论
  • Kubernetes - Chain of events behind a running Pod

    Kubernetes - Chain of events behind a running Pod

    What exactly happens behind the scenes when you create a pod/deployment? I'll try to cover the chain of events on a…

    6 条评论
  • Kubernetes Architecture

    Kubernetes Architecture

    CONTROL PLANE COMPONENTS: ETCD: Etcd is a fast, distributed, and consistent key-value store that is used as a backing…

    18 条评论
  • CONTAINER INTERNALS - Deep Dive

    CONTAINER INTERNALS - Deep Dive

    Linux technologies make up the foundations of building/running a container process in your system. Technologies like:…

    2 条评论
  • CONTAINER RUNTIMES - Deep Dive

    CONTAINER RUNTIMES - Deep Dive

    So what really happens in the backend when we pass the "docker run" command? If the image required by the container is…

    4 条评论
  • CONTAINER FUNDAMENTALS

    CONTAINER FUNDAMENTALS

    A container is a unit of software that wraps an application code, runtime, system tools, system libraries, and…

    13 条评论

社区洞察

其他会员也浏览了