Kubernetes

Kubernetes

### Kubernetes Overview

At the highest level, Kubernetes is two things:

- A cluster for running applications

- An orchestrator of cloud-native microservices apps.

On the cluster front, Kubernetes is like any other cluster – a bunch of nodes and a control plane. The control plane exposes an API, has a scheduler for assigning work to nodes, and state is recorded in a persistent store. Nodes are where application services run.

Kubernetes is API-driven and uses standard HTTP RESTful verbs to view and update the cluster.

On the orchestrator front, “orchestrator” is for an application that's made from lots of small independent services that work together to form a useful app.

To make this happen, we start out with an app, package it up and give it to the cluster (Kubernetes). The cluster is made up of one or more masters and a bunch of nodes.

The masters, sometimes called heads or head nodes, are in-charge of the cluster. This means they make the scheduling decisions, perform monitoring, implement changes, respond to events, and more. For these reasons, we often refer to the masters as the control plane.

The nodes are where application services run, and we sometimes call them the data plane. They have a reporting line back to the masters, and constantly watch for new work assignments.

To run applications on a Kubernetes cluster we follow this simple pattern:

1. Write the application as small independent services in our favourite languages.

2. Package each service in its own container.

3. Wrap each container in its own Pod.

4. Deploy Pods to the cluster via higher-level controllers such as; Deployments, DaemonSets, StatefulSets, CronJobs etc.

We're still near the beginning of the book and you're not expected to know what all of this means yet. However, at a high-level, Deployments offer scalability and rolling updates, DaemonSets run one instance of a Pod on every node in the cluster, StatefulSets are for stateful application components, and CronJobs are for work that needs to run at set times. There are more than these, but these will do for now.

Kubernetes likes to manage applications declaratively. This is a pattern where we describe how we want our application to look and feel in a set of YAML files, POST these files to Kubernetes, then sit back while Kubernetes makes it all happen.

### Master and Nodes

Kubernetes cluster is made of masters and nodes.

#### Masters aka Control Plane

A kubernetes master is a collection of system services that make up the control plane of the cluster.

- To have a proper control plane, HA (High Availability) is important.

- It is a good practice not to run user application on masters. Master

- or Control Plane is better left with managing the clusters

#### API Server

It is the core of Kubernetes. All the communication between all the components goes through the API server. Both internal system and external user components communicate via same API.

It exposes RESTful API that we POST YAML configuration over HTTPS. These YAML files, called manifests contain desired state of an application. This includes things like which container image to use, which port to expose and how many Pod replica to run.

All the requests to the API server is authorized and authenticated. Once these checks are done, the config file is validated, persisted to cluster store and deployed to the cluster.

***note***: Think API Server as the brain of the cluster.

#### The Cluster Store

The cluster store stores the entire configuration and the state of the cluster. It is based on etcd. It is advised to maintain HA for cluster store. The etcd prefers consistency over availability. This means it does not tolerate cluster-splitting situations and will stop working in order to maintain consistency. However if etcd becomes unavailable, the cluster should continue to work. It's just the updates to the cluster becomes unavailable.

Write consistency to the database is crucial for the cluster store. For example, multiple writes to the same value originating from different nodes need to be handled. etcd uses RAFT consensus algorithm to achieve this.

#### The Controller Manager

The controller manager is the controller of controllers. It is shipped as single monolithic binary. It implements multiple independent control loops to watch cluster and respond to events.

Following is some of the control loops:

- node controller

- endpoints controller

- replicaset controller

Each one of these runs as background watch-loop to monitor API Server for changes. The aim is to make sure the current state of the cluster matches the desired state.

The logic implemented by each control loop is following:

- Obtain desired state

- Observe current state

- Determine difference

- Reconcile difference

Each control loop is specialized and only interested in its own processes. It takes care of its own tasks and leaves other components alone.

#### The Scheduler

At high level, the scheduler watches for new work tasks and assigns them to appropriate healthy nodes. Behind the scenes, it implements complex logic that filters out nodes incapable of running the Pod and then ranks the nodes that are capable. The ranking system itself is complex, but the node with the highest ranking points is eventually selected to run the Pod.

When identifying nodes that are capable of running the Pod, the scheduler performs various predicate checks. These include; is the node tainted, are there any affinity or anti-affinity rules, is the Pod's network port available on the node, does the node have sufficient free resources etc. Any node incapable of running the Pod is ignored, and the remaining Pods are ranked according to things such as; does the node already have the required image, how much free resource does the node have, how many Pods is the node already running. Each criteria is worth points, and the node with the most points is selected to run the Pod.

If the scheduler cannot find a suitable node, the Pod cannot be scheduled and goes into pending.

It's not the job of the scheduler to perform the mechanics of running Pods, it just picks the nodes they will be scheduled on.

#### The Cloud Controller Manager

This is only applicable if you're running your cluster on supported public cloud platform. Your control plane will be running a cloud controller manager. It's job is to manage integrations with underlying cloud technologies and services such as instances, load-balancers and storage.

#### Control Plane Summary

Kubernetes master run all of the control plane services. Behind the scenes, master is made up of lots of small specialized control loops and services such as following:

- API Server

- The Cluster Store

- The Controller Manager

- The Scheduler

The API Server is the front-end into the control plane and the only component in the control plane we interact directly via RESTful API on port 443.

```

Master/Head/Control Plane

|--------------|

API SERVER

^ ^ ^

| | |

| | |

V | V

Scheduler | Controllers

V

Cluster Store

|--------------|

```

##### Nodes

Nodes are workers of a Kubernetes cluster. At high level they do 3 things:

- Watch the API Server for new work assignments

- Execute new work assignments

- Report back to the control plane

```

Node

- Kubelet -Container Runtime(CRI) -Network Proxy(kube-proxy)

```

##### Kubelet

Kubelet is the main point on every Node. It's the main agent, runs on every node in the cluster. When you join a new node to a cluster, the process involves installation of the kubelet which is then responsible for the node registration process. This effectively pools the node's CPU, RAM, and storage into the wider cluster pool.One of the main jobs of the kubelet is to watch the API server for new work assignments. Any time it sees one, it executes the task and maintains a reporting channel back to the control plane. It also keeps an eye on local static Pod definitions.

If a kubelet can't run a particular task, it reports back to the master and lets the control plane decide what actions to take. For example, if a Pod fails to start on a node, the kubelet is not responsible for finding another node to run it on. It simply reports back to the control plane and the control plane decides what to do.

##### Container Runtime

The Kubelet needs a container runtime to perform container-related tasks – things like pulling images and starting and stopping containers.

In the early days, Kubernetes had native support for a few container runtimes such as Docker. More recently, it has moved to a plugin model called the Container Runtime Interface (CRI). This is an abstraction layer for external (3rd-party) container runtimes to plug in to. At a high-level, the CRI masks the internal machinery of Kubernetes and exposes a clean documented interface for 3rd-party container runtimes to plug in to.

The CRI is the supported method for integrating runtimes into Kubernetes.

There are lots of container runtimes available for Kubernetes. One popular example is cri-containerd. This is a community-based open-source project porting the CNCF containerd runtime to the CRI interface. It has a lot of support and is replacing Docker as the preferred container runtime used in Kubernetes.

##### Kube-Proxy

This runs on every node in the cluster and is responsible for local networking. For example, it makes sure each node gets its own unique IP address, and implements local IPTABLES or IPVS rules to handle routing and load-balancing of traffic on the Pod network.

##### Kubernetes DNS

Every Kubernetes cluster has an internal DNS service that is vital to operations. The cluster's DNS service has a static IP address that is hard-coded into every Pod on the cluster, meaning all containers and Pods know how to find it. Every new service is automatically registered with the cluster's DNS so that all components in the cluster can find every Service by name. Some other components that are registered with the cluster DNS are StatefulSets and the individual Pods that a StatefulSet manages.

Cluster DNS is based on CoreDNS ([https://coredns.io/](https://coredns.io/)).

#### Packaging Apps

For an application to run on a Kubernetes cluster, following things must be done:

- Packaged as a container

- Wrapped in a Pod

- Deployed via a declarative manifest file

We write an application service in a language of our choice. We then build it into a container image and store it in a registry. At this point, the application service is containerized.

Next, we define a Kubernetes Pod to run the containerized service in. At the kind of high level we're at, a Pod is just a wrapper that allows containers to run on a Kubernetes cluster. Once we've defined a Pod for the container, we're ready to deploy it on the cluster.

Kubernetes offers several objects for deploying and managing Pods. The most common is the Deployment, which offers scalability, self-healing, and rolling updates. We define them in a YAML file that specifies things like which image to use and how many replicas to deploy.

```

- Deployment

Scaling, updates, rollback

- Pod

Kubernetes atomic unit of deployment

- Container

Application Code

```

Once everything is defined in the Deployment YAML file, we POST it to the cluster as the desired state of the application and let Kubernetes implement it.

#### The Declarative Model and Desired State

In Kubernetes, the declarative model works like this:

1. Declare the desired state of the application (microservice) in a manifest file

2. POST it to the Kubernetes API server

3. Kubernetes stores this in the cluster store as the application's desired state

4. Kubernetes implements the desired state on the cluster

5. Kubernetes implements watch loops to make sure the current state of the application doesn't vary from the desired state

Manifest files are written in simple YAML, and they tell Kubernetes how we want an application to look. We call this is the desired state. It includes things such as; which image to use, how many replicas to have, which network ports to listen on, and how to perform updates.

Once we've created the manifest, we POST it to the API server. The most common way of doing this is with the kubectl command-line utility. This POSTs the manifest as a request to the control plane, usually on port 443.

Once the request is authenticated and authorized, Kubernetes inspects the manifest, identifies which controller to send it to (e.g. the Deployments controller), and records the config in the cluster store as part of the cluster's overall desired state. Once this is done, the work gets scheduled on the cluster. This includes the hard work of pulling images, starting containers, building networks, and starting the application's processes.

Finally, Kubernetes utilizes background reconciliation loops that constantly monitor the state of the cluster. If the current state of the cluster varies from the desired state, Kubernetes will perform whatever tasks are necessary to reconcile the issue.

#### Pods

In the VMware world, the atomic unit of scheduling is the virtual machine (VM). In the Docker world, it's the container. In the Kubernetes world, it's the Pod.

It's true that Kubernetes runs containerized apps. However, you cannot run a container directly on a Kubernetes cluster – containers must always run inside of Pods.

##### Pods and Containers

The very first thing to understand is that the term Pod comes from a pod of whales – in the English language we call a group of whales a pod of whales. As the Docker logo is a whale, it makes sense that we call a group of containers a Pod.

The simplest model is to run a single container per Pod. However, there are advanced use-cases that run multiple containers inside a single Pod. Some of the examples of multi-container Pods:

- Service meshes

- Web containers supported by a helper container that pulls the latest content

- Container with tightly coupled log scraper

The point is, a Kubernetes Pod is a construct for running one or more containers.

##### Pod Anatomy

At the highest-level, a Pod is a ring-fenced environment to run containers. The Pod itself doesn't actually run anything, it's just a sandbox for hosting containers. Keeping it high level, you ring-fence an area of the host OS, build a network stack, create a bunch of kernel namespaces, and run one or more containers in it. That's a Pod.

If you're running multiple containers in a Pod, they all share the same environment. This includes things like the IPC namespace, shared memory, volumes, network stack and more.

This means the containers will share the same IP address. It will become the Pod's IP address. If two containers in the same Pod need to talk to each other, they can use the ports on the Pod's localhost interface.

Multi-container Pods are ideal when you have requirements for tightly coupled containers that may need to share memory and storage. However, if you don't need to tightly couple your containers, you should put them in their own Pods and loosely couple them over the network. This keeps things clean by having each Pod dedicated to a single task.

##### Pods as the Unit of Scaling

Pods are also the minimum unit of scheduling in Kubernetes. If you need to scale your app, you add or remove Pods. You do not scale by adding more containers to an existing Pod. Multi-container Pods are only for situations where two different, but complimentary, containers need to share resources.

##### Pods - Atomic Operations

The deployment of a Pod is an atomic operation. This means that a Pod is either entirely deployed, or not deployed at all. There is never a situation where a partially deployed Pod will be servicing requests. The entire Pod either comes up and is put into service, or it doesn't, and it fails.

A single Pod can only be scheduled to a single node. This is also true of multi-container Pods – all containers in the same Pod will run on the same node.

##### Pod Lifecycle

Pods are mortal. They're created, they live, and they die. If they die unexpectedly, we don't bring them back to life. Instead, Kubernetes starts a new one in its place. However, even though the new Pod looks, smells, and feels like the old one, it isn't. It's a shiny new Pod with a shiny new ID and IP address.

This has implications on how we should design our applications. Don't design them so they are tightly coupled to a particular instance of a Pod. Instead, design them so that when Pods fail, a totally new one (with a new ID and IP address) can pop up somewhere else in the cluster and seamlessly take its place.

### Deployments

We normally deploy Pods indirectly as part of something bigger. Examples include; Deployments, DaemonSets, and StatefulSets.

For example, a Deployment is a higher-level Kubernetes object that wraps around a particular Pod and adds features such as scaling, zero-downtime updates, and versioned rollbacks.

Behind the scenes, they implement a controller and a watch loop that is constantly observing the cluster making sure that current state matches desired state.

Deployments have existed in Kubernetes since version 1.2 and were promoted to GA (stable) in 1.9. You'll see them a lot.

### Services

We've just learned that Pods are mortal and can die. However, if they're managed via Deployments or DaemonSets, they get replaced when they fail. But replacements come with totally different IPs. This also happens when we perform scaling operations – scaling up adds new Pods with new IP addresses, whereas scaling down takes existing Pods away. Events like these cause a lot of IP churn.

The point we're making is that Pods are unreliable, which poses a challenge… Assume we've got a microservices app with a bunch of Pods performing video rendering. How will this work if other parts of the app that need to use the rendering service cannot rely on the rendering Pods being there when they need them?

This is where Services come in to play. Services provide reliable networking for a set of Pods.

Digging in to a bit more detail. Services are fully-fledged objects in the Kubernetes API – just like Pods and Deployments. They have a front-end that consists of a stable DNS name, IP address, and port. On the back-end, they load-balance across a dynamic set of Pods. Pods come and go, the Service observes this, automatically updates itself, and continues to provide that stable networking endpoint.

The same applies if we scale the number of Pods up or down. New Pods are seamlessly added to the Service, whereas terminated Pods are seamlessly removed.

That's the job of a Service – it's a stable network abstraction point that provides TCP and UDP load-balancing across a dynamic set of Pods.

As they operate at the TCP and UDP layer, Services do not possess application intelligence and cannot provide application-layer routing. For that, you need an Ingress, which understands HTTP and provides host and path-based routing.

#### Connecting Pods to Services

Services use labels and a label selector to know which set of Pods to load-balance traffic to. The Service has a label selector that is a list of all the labels a Pod must possess in order for it to receive traffic from the Service.

They only send traffic to healthy Pods. This means a Pod that is failing health-checks will not receive traffic from the Service.

### Installing Kubernetes in local environment

You have two choice, either install Docker for Desktop or minikube. For Docker for Desktop, you will need to enable Kubernetes to control your Docker installation. Installation will not be covered as it is covered in a lot of places online. Find a good guide to install and set up Kubernetes for Docker for Desktop or minikube.

Once you have installed either one or both, you can verify your installation by typing following command:

kubectl config get-contexts - This will help you to get current context of Kubernetes

kubectl config use-context <NAME_OF_CONTEXT> - This will help you set the context of Kubernetes.

### kubectl

kubectl is the main Kubernetes command-line tool and is what you should use for your Kubernetes management activities. In fact, it's useful to think of kubectl as SSH for Kubernetes. It's available for Linux, Mac and Windows.

As it's the main command-line tool, it's important that you use a version that is no more than one minor version higher or lower than your cluster. For example, if your cluster is running Kubernetes 1.13.x, your kubectl should be between 1.12.x and 1.14.x.

At a high-level, kubectl converts user-friendly commands into the JSON payload required by the API server. It uses a configuration file to know which cluster and API server endpoint to POST to.

By default, the kubectl configuration file is called config and lives in $HOME/.kube. It contains definitions for:

- Clusters

- Users

- Contexts

Clusters lets define multiple clusters and is ideal if you plan on using a single workstation to manage multiple clusters. Each cluster definition has a name, certificate info, and API server endpoint.

Users let you define different users that might have different level of permissions on each cluster. For example, you might have a dev user and an ops user, each with different permissions. Each user definition has a friendly name, a username, and a set of credentials.

Contexts bring together clusters and users under a friendly name. For example, you might have a context called deploy-prod that combines the deploy user credentials with the prod cluster definition. If you use kubectl with this context you will be POSTing commands to the API server of the prod cluster as the deploy user.

You can view your kubectl config using the kubectl config view command. Sensitive data will be redacted from the output.

You can use kubectl config current-context to see your current context.

You can change the current/active context with kubectl config use-context.

### Pod Theory

The atomic unit of scheduling in the virtualization world is the Virtual Machine (VM). This means deploying applications in the virtualization world means scheduling them on VMs.

In the Docker world, the atomic unit is the container. This means deploying applications on Docker means deploying them inside of containers.

In the Kubernetes world, the atomic unit is the Pod. Ergo, deploying applications on Kubernetes means stamping them out in Pods.

This is fundamental to understanding Kubernetes, so be sure to tag it in your brain as important >> Virtualization does VMs, Docker does containers, and Kubernetes does Pods.

As Pods are the fundamental unit of deployment in Kubernetes, it's vital we understand how they work.

#### Pods vs Containers

A Pod is a shared execution environment for one or more containers. Quite often it's on container per Pod, but multi-container Pods are gaining in popularity. One use-case for multi-container Pods is co-scheduling tightly-coupled workloads. For example, two containers that share memory wouldn't work if they were scheduled on different nodes in the cluster. Other increasingly common use-cases include logging and service meshes.

#### How Do We Deploy Pods

To deploy a Pod to a Kubernetes cluster we define it in a manifest file and POST that manifest file to the API server. The control plane examines it, writes it to the cluster store as a record of intent, and the scheduler deploys it to a healthy node with enough available resources. This process is identical for single-container Pods and multi-container Pods.

#### Pod Theory Summary

1. Pods are the atomic unit of scheduling in Kubernetes

2. You can have more than one container in a Pod. Single-container Pods are the simplest, but multi-container Pods are ideal for containers that need to be tightly coupled. They're also great for logging and service meshes

3. Pods get scheduled on nodes – you can't schedule a single Pod instance to span multiple nodes

4. Pods are defined declaratively in a manifest file that is POSTed to the API server and assigned to nodes by the scheduler

5. We almost always deploy Pods via higher-level objects

### Pods Hands On

Following the Kubernetes mantra of composable infrastructure, we define Pods in manifest files, POST these to the API server, and let the scheduler instantiate them on the cluster.

Following is an example Pod manifest.

~~~

apiVersion: v1
kind: Pod
metadata:
 name: hello-pod
 labels:
  zone: prod
  version: v1
spec:
 containers:
 - name: hello-ctr
   image: nigelpoulton/k8sbook:latest
   ports:
   - containerPort: 8080        

~~~

Save the file as pod.yml. Assuming you're running minikube as your current context, run following command to apply the manifest file.

Let's step through what the YAML file is describing.

Straight away we can see four top-level resources:

- apiVersion

- kind

- metadata

- spec

The .apiVersion field tells us two things – the API group and the API version that will be used to create the object. Normally the format is <api-group>/<version>. However, Pods are defined in a special API group called the core group which omits the api-group part. For example, StorageClass objects are defined in v1 of the storage.k8s.io API group and are described in YAML files as storage.k8s.io/v1. However, Pods are in the core API group which is special, as it omits the API group name, so we describe them in YAML files as just v1.

It's possible for a resource to be defined in multiple versions of an API group. For example, some-api-group/v1 and some-api-group/v2. In this case, the definition in the newer group would probably include additional features and fields that extend the capabilities of the resource. Think of the version field as defining the schema – newer is usually better. Interestingly, there may be occasions where you deploy an object via one version in the YAML file, but when you introspect it, the return values show it as another version. For example, you may deploy an object by specifying v1 in the YAML file, but when you run commands against it the returns might show it as v1beta1. This is normal behavior.

Anyway, Pods are currently defined at the v1 path.

The .kind field tells Kubernetes the type of object being deployed.

So far, we know we're deploying a Pod object as defined in v1 of the core API group.

The .metadata section is where we attach a name and labels. These help us identify the object in the cluster, and labels help us create loose couplings. We can also define the namespace that an object should be deployed to. Keeping things brief, namespaces allow us to logically divide clusters for management purposes. In the real world, it's highly recommended to use namespaces, however, you should not think of them as strong security boundaries.

The .metadata section of this Pod manifest is naming the Pod “hello-pod” and assigning it two labels. Labels are simple key-value pairs, but they're insanely powerful. We'll talk more about labels later as we build our knowledge.

As the .metadata section does not specify a namespace, the default namespace is assumed. It's not good practice to use the default namespace in the real world.

The .spec section is where we define any containers that will run in the Pod. Our example is deploying a Pod with a single container based on the nigelpoulton/k8sbook:latest image. It's calling the container hello-ctr and exposing it on port 8080.

If this was a multi-container Pod, we'd define additional containers in the .spec section.


要查看或添加评论,请登录

Vigneshwaran Ravichandran的更多文章

  • #RHEL - RHCE prepare guide

    #RHEL - RHCE prepare guide

    ## Ansible basics RHCE paper prior to Ansible introduction was fairly easy to do. With introduction of Ansible, we have…

  • Puppet Module - Network Module for EL9

    Puppet Module - Network Module for EL9

    I realized that there is no puppet module to manage networking, especially for RedHat Enterprise Linux 9. Since RHEL 9…

  • Using puppet for PXE BOOT

    Using puppet for PXE BOOT

    First we need to install Oracle Linux 9.4 and make sure it is updated.

  • Zabbix Container setup

    Zabbix Container setup

    This article explores the way to setup Zabbix in containers. You are required to have some knowledge in Docker/Podman.

  • PXE Boot Server with Ansible

    PXE Boot Server with Ansible

    Following should be your Ansible working directory. You don't necessarily need ansible.

  • PXE BOOT WINDOWS 2022 via EFI

    PXE BOOT WINDOWS 2022 via EFI

    # PXE BOOT WINDOWS 2022 via EFI This article explores the opportunity of booting PXE BOOT Windows 2022 DC Edition via…

  • PXE - Preboot eXecution Environment

    PXE - Preboot eXecution Environment

    PXE (Preboot eXecution Environment) boot is a method that allows a computer to boot from a network interface, rather…

  • FreeBSD based Hypervisor

    FreeBSD based Hypervisor

    This article is more less like a personal thought cum technical jump-start to have a hypervisor working. Most of the…

  • #AnsibleSeries - Inventory & Basics

    #AnsibleSeries - Inventory & Basics

    Today I am going to talk about Ansible. Using puppet for a while now makes me appreciate how these tools help simplify…

  • LUKS & NBDE (Network Bound Disk Encryption)

    LUKS & NBDE (Network Bound Disk Encryption)

    I got a request to write this article. I was studying in vain for an exam and one of the topics I hit 100% mark is the…

社区洞察

其他会员也浏览了