Docker: EPISODE 1

Docker: EPISODE 1


This is Episode 1, of my Docker & K8 Series. As I have promised, this series focuses on Docker and Kubernetes (K8), but from a Data Engineer perspective.

Today we are going to learn some concepts about Hypervisors. I know the term hypervisors sounds a bit new to you folks, but please don't get confused, to learn Docker we must have a basic understanding of hypervisors.

What is a Hypervisor?

Let's say, you have bought a Ubuntu laptop, and on a shiny Sunday morning, you want to play FIFA 21 on your Ubuntu laptop ( sounds crazy right !!! ). Some people will say that you should use Wine ( windows emulator for Ubuntu ) but you know it's crap, so you decide to install Oracle VirtualBox and install Windows in that VirtualBox and play FIFA there ( Please don't try it, very bad experience ).

But how come Windows is running on a Linux laptop, as we all know windows uses the Windows NT kernel which is different from the Linux kernel. You will say, because of the Oracle VirtualBox bla bla bla ... This is what is called Virtualization, and the Software / Firmware used here is called Hypervisor.

A Hypervisor (also called a virtual machine monitor or VMM) is a software or firmware that manages and allocates different hardware resources of a host machine to guest machines. There are two types of Hypervisors and their working principles differ by quite a large degree.

Types of Hypervisor

???? Type 1 Hypervisor

Type 1 Hypervisor also known as bare-metal Hypervisor is a firmware (can also be hardware) that runs directly on system hardware. This hypervisor has a pre-installed kernel, which means it comes with hardware support. This hypervisor controls hypothetically any number of operating systems or virtual machines running as guest machines on the host hardware. But this type of hypervisor comes with a drawback, as it comes with a pre-installed kernel, only those OS can be installed which use the same kernel of the Hypervisor. But that doesn't mean, Microsoft Hyper-V can not let you install Ubuntu, although the kernel used by Ubuntu and Windows OS, are different. In that case, it will be a layered type 1 hypervisor. Generally, Type 1 hypervisors are faster compared to type 2 and offer more isolation.

Example: Xen, VMware ESXi, and Microsoft Hyper-V

???? Type 2 Hypervisor

Type 2 Hypervisor is software that runs on a host operating system. This Hypervisor creates a process and allocates system resources like memory, persistent storage, and other vital things. Then guest OS runs inside this process and uses available resources. Here Hypervisor acts as a middleman between the guest OS and host OS to translate guest OS instructions so that the host OS can understand it. But don't be so happy, Type 2 Hypervisor runs on the host OS, and any problems in the host OS can cripple any guest machines running on it. Also, the Type 2 Hypervisor is not in full control of the host’s hardware, so we can face some scaling limitations there.

Example: Oracle VirtualBox, VMware Player.

Source:


If the same virtual machine is running on two different host machines of the same hardware configuration but one through Type 1 Hypervisor and the other through Type 2 Hypervisor, then technically machine with Type 1 Hypervisor would run faster. This is because there is only one layer which is the Hypervisor itself through which the virtual machine is communicating with the hardware. But In a Type 2 Hypervisor machine, there are two layers

Some Reality Check

Now you are so happy, that you got a trial AWS account where you can have 720 whooping hours of EC2 instances. When you create an EC2 instance with your desired OS type ( AMI ), storage, subnet, and VPC, Amazon does not buy machine hardware with your desired configuration for you. Instead, they create a virtual machine on their large infrastructure using either Type 1 or Type 2 Hypervisor.

Source:

A Hypothetical Scenario

Till this point, we have learned about Hypervisors and Virtualization. Now let me give you a scenario. You have 30 applications that need a virtual isolated environment what are you gonna do ??

Answer 1: Go to AWS/Azure/GCP Buy 30 VM. Go ahead and do it, after a week you will understand why this Option is so bad.

Answer 2: Let's create 30 VirtualBox, but a VirtualBox consumes huge memory and CPU, so you will end up with a migraine. I mean a lot of RAM and a Lot of CPU Cores ( practically vertical scaling ).

SO WHAT is IT GONNA BE

This is where containerization comes into the picture

What is Container

A container is an isolated execution environment where one or many processes can run in isolation. The action of creating containers and running your application as a process inside it; is known as containerization. Linux Kernel provides such a containerization mechanism where you can create many containers on a single Linux host called Linux Based Containers or LXCs.

As stated, a process inside the container has an isolated environment. That includes a network interface (to obtain IP addresses), process IDs (PIDs), mount points, etc. Linux Kernel out of the box provides some of the features like Namespaces and Control Groups to make this happen. In the coming Episodes, I will definitely talk about CG ( control groups and namespaces and ECI ).

In general (and not specifically related to Linux), a container is nothing but a set of processes we just talked about. A container has a unique namespace and all processes running inside it will have their share of resources allocated by the container’s control group. Any process inside the container will not be able to see or interface with the resources allocated to other containers. All containers share the same kernel (of the host operating system) and when a container needs a different kernel, then virtualization has to be provided.

In general (and not specifically related to Linux), a program that manages containers is called a container engine. It is also responsible for allocating system resources to the running containers by communicating with the kernel using system calls. It runs as a daemon on the host operating system. In a nutshell, a container engine is like a Type 2 Hypervisor whereas a container is like a virtual machine.

Since there is no Hypervisor used to create different isolated environments directly on the host operating system (these are containers), containerization sometimes is called OS Level Virtualization.


Rajesh D

AI ML |Generative AI| Agentic AI |NLP|Computer vision| Data Science | Data Stage | AWS | Azure | Devops| Bigdata | Pyspark | Deep learning|ELT|ETL|Copilot|Studio| Agile | Scrum | Safe | Kanban| Release Management

1 年

Informative and crisp..keep posting good!

回复
Raji Shaik Majeeth

Data Scientist, Data Engineer. Data Architect, Certified Data Management Professional (CDMP), ISO 8000 MDQM, ITIL Certified.HL7/FHIR Standards Integrator, Passion for AI/ML in Healthcare

1 年

Nicely written and easy to understand for people trying to understand the difference between VMs and Containers.

要查看或添加评论,请登录

Koushik Dutta的更多文章

  • Anatomy of a Dockerfile

    Anatomy of a Dockerfile

    A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an…

    3 条评论
  • Docker Architecture

    Docker Architecture

    Docker uses a client-server architecture. The Docker client talks to the Docker daemon, which does the heavy lifting of…

  • HOW DOCKER WORKS

    HOW DOCKER WORKS

    Now that we learned about containerization and how containers work, it’s time to face the ultimate truth. Docker is…

    1 条评论
  • The Celebrity Of Design Pattern

    The Celebrity Of Design Pattern

    S-I-N-G-L-E-T-O-N Pattern and Its Usage Python is the fastest-growing open-source programming language. Everyone is…