From Command to Servicing, the complex process behind a Kubernetes pod creation

In #Kubernetes , building and managing a component is a continuous process. A simple command to create a Pod in Kubernetes is executed through several #eventdriven collaborations across different parts in the cluster's control plane and node server.

Every deployable component in Kubernetes, such as Pod, has a desired state specification, detailed in the configuration provided in the deployment API call.

Kubernetes, in principle, is responsible for creating the component and managing the active state towards the desired state through continuous monitoring, management, and optimization component state and the cluster workloads and resources.

Let us see the internal event-driven collaboration steps involved in creating a Pod component in Kubernetes.

The high-level illustrative view

No alt text provided for this image

Step 1- The interaction between the client and API server

When a user with Kubernetes client API access executes the pod deployment command with the deployment specification, it will be intercepted by the #API server and completed with the following steps.

  • Activity-1 - The API server authenticates and authorizes the request.
  • Activity-2 - The API server validates the request and performs any necessary transformations on the data as needed.
  • Activity-3 - The API server sends the request to the #ETCD key-value store, which stores the configuration data for the Kubernetes objects. ETCD stores the data in its distributed key-value store and returns a response to the API server.
  • Activity-4- The API server receives the response from ETCD and sends a response back to the user or the Kubernetes component that made the original request with the status of the component.

Step2- The collaboration between the scheduler and API server

Once the object definition is available in the ETCD store, it must be scheduled for deployment to a node where actual compute, #network , and #storage resources are available to deploy the component. The #scheduler component from the #controlplane is responsible for scheduling. The scheduler selects the nodes to run new pods based on the resource #requirements , #affinity , anti-affinity, and other constraints specified in the pod's deployment descriptor. ?

The scheduler is a separate process that runs on each Kubernetes cluster control plane node. Scheduler observes the API server's state for unscheduled pods and decides which node to place the pod. The scheduler then updates the API server with the node assignment for the pod. The scheduler operates a continuous loop, monitoring the Kubernetes API server for new or updated workload objects that need to be scheduled.

Here is a high-level overview of the Kubernetes API server scheduler process flow:

  • Activity-5- The scheduler listens for new or updated workload objects, such as Deployments or Stateful-Sets, from the Kubernetes API server.
  • Activity-7 - The scheduler examines the workload objects to determine their scheduling requirements, such as the desired number of replicas, resource requirements, affinity/anti-affinity rules, and other constraints.
  • Activity-8 - The scheduler then filters out nodes unsuitable for the workload based on the availability of required resources (e.g., CPU, memory, storage), node labels, taints, and other constraints.
  • Activity-9 - The scheduler then scores the remaining nodes based on a set of configurable scheduling policies, including factors such as node resource usage, workload affinity/anti-affinity, node locality, and others.
  • Activity-10 - The scheduler selects the node with the highest score and assigns the workload to that node.
  • Activity-11 - The scheduler notifies the API server of its decision, and the API server updates the pod's status to reflect the new node assignment.

Step-3 The collaboration between KUBLET and API server

The #KUBELET is a Kubernetes agent that runs on each node, retrieves the pod specification from the ETCD datastore through API server calls, and ensures that the pod's containers run using the node resources.

  • Activity-12 - If the pod's containers are not already running, the kubelet works through the Container Runtime Interface with the container engine and container runtime engine to pull the container images from the container registry and starts the containers.
  • Activity-13 - ?The kubelet work with a network proxy [depends on the network plugin]to set up networking for the pod, including assigning an IP address and configuring the network interface.
  • Activity-14 - The kubelet monitors the pod's status and reports any issues to the Kubernetes API server.
  • Activity-15 - When the pod is no longer needed, the user or automation tool issues a command to delete it. The Kubernetes API server deletes the pod specification from the ETCD datastore. The kubelet stops the pod's containers and removes the pod's networking configuration. Overall, the Kubernetes container runtime process flow involves creating a pod specification, scheduling the pod to a node, starting the pod's containers, and monitoring the pod's status. The kubelet on each node is responsible for managing the pod's containers and ensuring that the pod is running correctly.

Step-4 KUBELET collaboration with container engine and container runtime

A #containerengine is a component responsible for managing and executing container processes on a host machine. ?Examples of container engines include Docker, rkt, and CRI-O. These engines provide the low-level functionality required to create, run, and manage containers, including container lifecycle management, networking, storage, and security.

A #containerruntime , on the other hand, is responsible for executing the container images on the host machine. It provides an interface between the container engine and the container images, allowing the engine to interact with the container images to create and manage containers. Examples of container runtimes include containerd, CRI-O, and runc.

Kubernetes can work with multiple container runtimes and engines, depending on specific needs and preferences. When deploying Kubernetes, we can choose to use a specific container engine or runtime that we prefer. The cluster deployment will set up the node host's corresponding processes to enable them, and KUBELET will use that engine to create and manage containers for the application.


There are differences among the most famous container engines - Runc, Kata-runtime, and Clear Container.

  • #Runc : Runc is a lightweight, standalone command-line tool for running containers according to the Open Container Initiative (OCI) specification. It is used by popular container runtimes such as Docker and Kubernetes. Runc provides process isolation using Linux namespaces, cgroups, and seccomp. It does not use hardware virtualization or nested virtualization.
  • #Kata -runtime: Kata-runtime is a container runtime that uses lightweight virtual machines (VMs) to isolate containers. It uses the same OCI specification as Runc but adds an extra layer of isolation through hardware virtualization. Kata-runtime creates a lightweight VM for each container and isolates each container at the hypervisor level, which provides additional security and separation between containers.
  • ClearContainers: Clear Container is an alternative runtime for Docker that uses hardware virtualization to provide container isolation. Like Kata-runtime, it creates a lightweight VM for each container, but it uses the Intel Clear Containers technology to achieve this. This technology uses a lightweight hypervisor and Intel Virtualization Technology (VT-x) to provide hardware-assisted isolation.

In typical scenarios, Under the hood, containerd uses runc as the default container runtime to create and manage containers.

  • Kubernetes relies on containerd's implementation of the CRI [Container Runtime Interface] specification to interface with runc and manage containers running on Kubernetes nodes.
  • When KUBLET needs to start a new container, it sends a request to containerd, which in turn uses runc to create the container.

A high-level overview of the lifecycle process of a container managed by runc are the following.

  • Create a container: This involves creating a container configuration file that specifies the container's runtime parameters, such as its root filesystem, hostname, network settings, and resource constraints.
  • Start the container: This involves launching a new process inside the container's namespace using the container runtime specified in the container configuration file. The process is typically the container's init process, which is responsible for starting and managing all other processes running in the container.
  • Manage the container: Once the container is running, runc can be used to manage its lifecycle, including stopping, pausing, resuming, and checkpointing the container. These actions are implemented by sending signals to the container's init process.
  • Stop the container: This involves sending a signal to the container's init process to stop all processes running in the container and then cleaning up any associated resources, such as network interfaces and storage devices.
  • Delete the container: This involves removing the container's root filesystem and any other associated resources, such as network interfaces and storage devices.

Container Engine and Runtime collaboration with host server resources

In #linux process, separations are done through namespaces. ?#Namespaces are features of the kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources. There are six namespaces in Linux, including the following.

  • Network namespace isolates the network stack, including network interfaces, routing tables, and firewall rules.
  • UTS namespace isolates the hostname and domain name of the container.
  • IPC namespace isolates the System V IPC (Inter-Process Communication) resources, such as shared memory segments and message queues.
  • Mount namespace isolates the filesystem mount points. Each container has its mount namespace, which means that processes running in different containers cannot see each other's filesystems.
  • User namespace isolates user and group IDs so that processes running in different containers cannot see each other's users and groups.
  • Process ID namespace isolates the PID (process ID) number space so that each container has its own set of process IDs.


During the Pod creation, the following steps take place.

  • First, create the pod container as a pause container, a process group with shared namespaces, which can be inherited to make all the actual application-specific containers based on the pod deployment spec..
  • Assign an IP address from the cluster's pod IP address block, assigned based on the cluster configuration specs.
  • Establish a network bridge between the node and pod so that they can communicate with each other.
  • Create each application-specific container by inheriting the pause container so that certain namespaces will be shared across all application-specific containers within a pod, as detailed below.

Share the same network, IPC, and UTS namespace. Containers in the same pod can communicate with each other using standard inter-process communications such as System-V semaphores or POSIX shared memory. Containers in a Pod are accessible via “localhost”; they use the same network namespace. The containers, observable hostname is the same as the pod name because containers share the same IP address and port space. Each application-specific container [containers created as per pod configuration specification to meet the client-specific deployment needs] in the Pod should use different ports in containers for incoming connections.

Conclusion

Kubernetes has revolutionized IT infrastructure by providing unparalleled #scalability , #reliability , and #flexibility .

It has catalyzed businesses to move away from the traditional VM-based approach to modern container-based technologies and allowed them to substantially reduce costs with its more efficient resource utilization.

As the use of Kubernetes continues to expand, it will remain an essential tool in any organization's arsenal as they continue its journey toward digital transformation.

Kubernetes is the finest example of managing containers using event-driven #architecture collaboration with many asynchronous processes across client, platform, and low-level infrastructure interfaces to run containerized business application workloads for digital era customer satisfaction with the highest reliability and availability.

Dmytro Chaurov

CEO | Quema | Building scalable and secure IT infrastructures and allocating dedicated IT engineers from our team

1 年

Shaji, thanks for sharing!

回复

要查看或添加评论,请登录

Shaji Nair的更多文章

社区洞察

其他会员也浏览了