Kubernetes
R?dvan Murteza
Senior Information Technology System Administrator - Turkish Aerospace
Kubernetes ;is an open-source container orchestration system for automating software deployment, scaling, and management.Originally designed by Google, the project is now maintained by the Cloud Native Computing Foundation.
The name Kubernetes originates from Greek, meaning 'helmsman' or 'pilot'. Kubernetes is often abbreviated as K8s, counting the eight letters between the K and the s?
Kubernetes works with containerd and CRI-OIts suitability for running and managing large cloud-native workloads has led to widespread adoption of it in the data center. There are multiple distributions of this platform – from ISVs as well as hosted-on-cloud offerings from all the major public cloud vendors.
Concepts
Kubernetes defines a set of building blocks ("primitives") that collectively provide mechanisms that deploy, maintain, and scale applications based on CPU, memory or custom metrics. Kubernetes is loosely coupled and extensible to meet different workloads. The internal components as well as extensions and containers that run on Kubernetes rely on the Kubernetes API. The platform exerts its control over compute and storage resources by defining resources as Objects, which can then be managed as such.
Kubernetes follows the primary/replica architecture. The components of Kubernetes can be divided into those that manage an individual node and those that are part of the control plane.
Control plane
The Kubernetes master node handles the Kubernetes control plane of the cluster, managing its workload and directing communication across the system. The Kubernetes control plane consists of various components, each its own process, that can run both on a single master node or on multiple masters supporting high-availability clusters. The various components of the Kubernetes control plane are as follows:
etcd is a persistent, lightweight, distributed, key-value data store that CoreOS has developed. It reliably stores the configuration data of the cluster, representing the overall state of the cluster at any given point of time. etcd favors consistency over availability in the event of a network partition (see CAP theorem). The consistency is crucial for correctly scheduling and operating services.
The API server serves the Kubernetes API using JSON over HTTP, which provides both the internal and external interface to Kubernetes. The API server processes and validates REST requests and updates the state of the API objects in etcd, thereby allowing clients to configure workloads and containers across worker nodes. The API server uses etcd's watch API to monitor the cluster, roll out critical configuration changes, or restore any divergences of the state of the cluster back to what the deployer declared. As an example, the deployer may specify that three instances of a particular "pod" (see below) need to be running. etcd stores this fact. If the Deployment Controller finds that only two instances are running (conflicting with the etcd declaration), it schedules the creation of an additional instance of that pod.
The scheduler is the extensible component that selects on which node an unscheduled pod (the basic entity managed by the scheduler) runs, based on resource availability. The scheduler tracks resource use on each node to ensure that workload is not scheduled in excess of available resources. For this purpose, the scheduler must know the resource requirements, resource availability, and other user-provided constraints or policy directives such as quality-of-service, affinity vs. anti-affinity requirements, and data locality. The scheduler's role is to match resource "supply" to workload "demand".
A controller is a reconciliation loop that drives the actual cluster state toward the desired state, communicating with the API server to create, update, and delete the resources it manages (e.g., pods or service endpoints). One kind of controller is a Replication Controller, which handles replication and scaling by running a specified number of copies of a pod across the cluster. It also handles creating replacement pods if the underlying node fails. Other controllers that are part of the core Kubernetes system include a DaemonSet Controller for running exactly one pod on every machine (or some subset of machines), and a Job Controller for running pods that run to completion (e.g., as part of a batch job). Labels selectors that are part of the controller's definition specify the set of pods that a controller manages.
Nodes
A node, also known as a worker or a minion, is a machine where containers (workloads) are deployed. Every node in the cluster must run a container runtime such as containerd, as well as the below-mentioned components, for communication with the primary network configuration of these containers.
Kubelet is responsible for the running state of each node, ensuring that all containers on the node are healthy. It takes care of starting, stopping, and maintaining application containers organized into pods as directed by the control plane. Kubelet monitors the state of a pod, and if not in the desired state, the pod re-deploys to the same node. Node status is relayed every few seconds via heartbeat messages to the primary. Once the primary detects a node failure, the Replication Controller observes this state change and launches pods on other healthy nodes.
Kube-proxy is an implementation of a network proxy and a load balancer, and it supports the service abstraction along with other networking operations. It is responsible for routing traffic to the appropriate container based on IP and port number of the incoming request.
A container resides inside a pod. The container is the lowest level of a micro-service, which holds the running application, libraries, and their dependencies. Containers can be exposed to the world through an external IP address. Kubernetes has supported Docker containers since its first version. In July 2016 the rkt container engine was added.
Namespaces
In Kubernetes, namespaces are utilized to segregate the resources it handles into distinct and non-intersecting collections.[49] They are intended for use in environments with many users spread across multiple teams, or projects, or even separating environments like development, test, and production.
Pods
The basic scheduling unit in Kubernetes is a pod,[50] which consists of one or more containers that are guaranteed to be co-located on the same node. Each pod in Kubernetes is assigned a unique IP address within the cluster, allowing applications to use ports without the risk of conflict. Within the pod, all containers can reference each other.
A volume, such as a local disk directory or a network disk, can be defined within a pod and made accessible to the containers residing in it. Pods can be managed manually through the Kubernetes API, or their management can be delegated to a controller.Such volumes are also the basis for the Kubernetes features of ConfigMaps (to provide access to configuration through the file system visible to the container) and Secrets (to provide access to credentials needed to access remote resources securely, by providing those credentials on the file system visible only to authorized containers).
DaemonSets
Typically, the task of determining the node on which a pod should run is delegated to the Kubernetes Scheduler. However, in certain scenarios, it may be necessary to deploy a pod on every node in the cluster, which is particularly helpful for use cases involving log collection, ingress controllers, and storage services. This specific type of pod scheduling can be achieved by utilizing DaemonSets.
ReplicaSets
A ReplicaSet's purpose is to maintain a stable set of replica pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
The ReplicaSets can also be said to be a grouping mechanism that lets Kubernetes maintain the number of instances that have been declared for a given pod. The definition of a ReplicaSet uses a selector, whose evaluation will result in identifying all pods that are associated with it.
Services
A Kubernetes service is a set of pods that work together, such as one tier of a multi-tier application. The set of pods that constitute a service are defined by a label selector. Kubernetes provides two modes of service discovery, using environmental variables or using Kubernetes DNS.Service discovery assigns a stable IP address and DNS name to the service, and load balances traffic in a round-robin manner to network connections of that IP address among the pods matching the selector (even as failures cause the pods to move from machine to machine).By default a service is exposed inside a cluster (e.g., back end pods might be grouped into a service, with requests from the front-end pods load-balanced among them), but a service can also be exposed outside a cluster (e.g., for clients to reach front-end pods).
Volumes
File systems in the Kubernetes container provide ephemeral storage, by default. This means that a restart of the pod will wipe out any data on such containers, and therefore, this form of storage is quite limiting in anything but trivial applications. A Kubernetes Volume provides persistent storage that exists for the lifetime of the pod itself. This storage can also be used as shared disk space for containers within the pod. Volumes are mounted at specific mount points within the container, which are defined by the pod configuration, and cannot mount onto other volumes or link to other volumes. The same volume can be mounted at different points in the file system tree by different containers.
ConfigMaps and secrets
A common application challenge is deciding where to store and manage configuration information, some of which may contain sensitive data. Configuration data can be anything as fine-grained as individual properties or coarse-grained information like entire configuration files or JSON / XML documents. Kubernetes provides two closely related mechanisms to deal with this need: "configmaps" and "secrets", both of which allow for configuration changes to be made without requiring an application build. The data from configmaps and secrets will be made available to every single instance of the application to which these objects have been bound via the deployment. A secret and/or a configmap is sent to a node only if a pod on that node requires it. Kubernetes will keep it in memory on that node. Once the pod that depends on the secret or configmap is deleted, the in-memory copy of all bound secrets and configmaps are deleted as well. The data is accessible to the pod through one of two ways:
as environment variables (which will be created by Kubernetes when the pod is started);
available on the container file system that is visible only from within the pod.
The data itself is stored on the master which is a highly secured machine which nobody should have login access to. The biggest difference between a secret and a configmap is that the content of the data in a secret is base64 encoded. Recent versions of Kubernetes have introduced support for encryption to be used as well. Secrets are often used to store data like certificates, credentials to work with image registries, passwords, and ssh keys.
领英推荐
StatefulSets
Scaling stateless applications is only a matter of adding more running pods. Stateful workloads are harder, because the state needs to be preserved if a pod is restarted. If the application is scaled up or down, the state may need to be redistributed. Databases are an example of stateful workloads. When run in high-availability mode, many databases come with the notion of a primary instance and secondary instances. In this case, the notion of ordering of instances is important. Other applications like Apache Kafka distribute the data amongst their brokers; hence, one broker is not the same as another. In this case, the notion of instance uniqueness is important.
StatefulSets sed to run stateful applications.
Replication controllers and deployments
A ReplicaSet declares the number of instances of a pod that is needed, and a Replication Controller manages the system so that the number of healthy pods that are running matches the number of pods declared in the ReplicaSet (determined by evaluating its selector).
Deployments are a higher-level management mechanism for ReplicaSets. While the Replication Controller manages the scale of the ReplicaSet, Deployments will manage what happens to the ReplicaSet – whether an update has to be rolled out, or rolled back, etc. When deployments are scaled up or down, this results in the declaration of the ReplicaSet changing - and this change in the declared state is managed by the Replication Controller.
Labels and selectors
Kubernetes enables clients (users or internal components) to attach keys called "labels" to any API object in the system, such as pods and nodes. Correspondingly, "label selectors" are queries against labels that resolve to matching objects.????When a service is defined, one can define the label selectors that will be used by the service router/load balancer to select the pod instances that the traffic will be routed to. Thus, simply changing the labels of the pods or changing the label selectors on the service can be used to control which pods get traffic and which don't, which can be used to support various deployment patterns like blue–green deployments or A/B testing. This capability to dynamically control how services utilize implementing resources provides a loose coupling within the infrastructure.
For example, if an application's pods have labels for a system tier (with values such as frontend, backend, for example) and a release_track (with values such as canary, production, for example), then an operation on all of backend and canary nodes can use a label selector, such as:
tier=backend AND release_track=canary
Add-ons
Add-ons operate just like any other application running within the cluster: they are implemented via pods and services, and are only different in that they implement features of the Kubernetes cluster. The pods may be managed by Deployments, ReplicationControllers, and so on. There are many add-ons, and the list is growing. Some of the more important are:
DNS
All Kubernetes clusters should have cluster DNS; it is a mandatory feature. Cluster DNS is a DNS server, in addition to the other DNS server(s) in your environment, which serves DNS records for Kubernetes services. Containers started by Kubernetes automatically include this DNS server in their DNS searches.
Web UI
This is a general purpose, web-based UI for Kubernetes clusters. It allows users to manage and troubleshoot applications running in the cluster, as well as the cluster itself.
Container Resource Monitoring
Providing a reliable application runtime, and being able to scale it up or down in response to workloads, means being able to continuously and effectively monitor workload performance. Container Resource Monitoring provides this capability by recording metrics about containers in a central database, and provides a UI for browsing that data. The cAdvisor is a component on a slave node that provides a limited metric monitoring capability. There are full metrics pipelines as well, such as Prometheus, which can meet most monitoring needs.
Container Cost Monitoring
Kubernetes cost monitoring apps allow to break down costs by pods, nodes, namespaces, and labels. Three crucial metrics to track are daily cloud spend, cost per provisioned and requested CPU, historical cost allocation.??????
Cluster-level logging
Logs should have a separate storage and lifecycle independent of nodes, pods, or containers. Otherwise, node or pod failures can cause loss of event data. The ability to do this is called cluster-level logging, and such mechanisms are responsible for saving container logs to a central log store with search/browsing interface. Kubernetes provides no native storage for log data, but one can integrate many existing logging solutions into the Kubernetes cluster.
Storage
Containers emerged as a way to make software portable. The container contains all the packages you need to run a service. The provided file system makes containers extremely portable and easy to use in development. A container can be moved from development to test or production with no or relatively few configuration changes.
Historically Kubernetes was suitable only for stateless services. However, many applications have a database, which requires persistence, which leads to the creation of persistent storage for Kubernetes. Implementing persistent storage for containers is one of the top challenges of Kubernetes administrators, DevOps and cloud engineers. Containers may be ephemeral, but more and more of their data is not, so one needs to ensure the data's survival in case of container termination or hardware failure. When deploying containers with Kubernetes or containerized applications, companies often realize that they need persistent storage. They need to provide fast and reliable storage for databases, root images and other data used by the containers.
In addition to the landscape, the Cloud Native Computing Foundation (CNCF), has published other information about Kubernetes Persistent Storage including a blog helping to define the container attached storage pattern. This pattern can be thought of as one that uses Kubernetes itself as a component of the storage system or service.???????
More information about the relative popularity of these and other approaches can be found on the CNCF's landscape survey as well, which showed that OpenEBS from MayaData and Rook – a storage orchestration project – were the two projects most likely to be in evaluation as of the Fall of 2019.??????
Container Attached Storage is a type of data storage that emerged as Kubernetes gained prominence. The Container Attached Storage approach or pattern relies on Kubernetes itself for certain capabilities while delivering primarily block, file, object and interfaces to workloads running on Kubernetes.????
Common attributes of Container Attached Storage include the use of extensions to Kubernetes, such as custom resource definitions, and the use of Kubernetes itself for functions that otherwise would be separately developed and deployed for storage or data management. Examples of functionality delivered by custom resource definitions or by Kubernetes itself include retry logic, delivered by Kubernetes itself, and the creation and maintenance of an inventory of available storage media and volumes, typically delivered via a custom resource definition.??????
Container Storage Interface (CSI)
In Kubernetes version 1.9, the initial Alpha release of Container Storage Interface (CSI) was introduced.??Previously, storage volume plug-ins were included in the Kubernetes distribution. By creating a standardized CSI, the code required to interface with external storage systems was separated from the core Kubernetes code base. Just one year later, the CSI feature was made Generally Available (GA) in Kubernetes.???
API
A key component of the Kubernetes control plane is the API Server, which exposes an HTTP API that can be invoked by other parts of the cluster as well as end users and external components. This API is a REST API and is declarative in nature. There are two kinds of API resources. Most of the API resources in the Kubernetes API are objects. These represent a concrete instance of a concept on the cluster, like a pod or namespace. A small number of API resource types are "virtual". These represent operations rather than objects, such as a permission check, using the "subjectaccessreviews" resource. API resources that correspond to objects will be represented in the cluster with unique identifiers for the objects. Virtual resources do not have unique identifiers.
Operators
Kubernetes can be extended using Custom Resources. These API resources represent objects that are not part of the standard Kubernetes product. These resources can appear and disappear in a running cluster through dynamic registration. Cluster administrators can update Custom Resources independently of the cluster.
Custom Controllers are another extension mechanism. These interact with Custom Resources, and allow for a true declarative API that allows for the lifecycle management of Custom Resource that is aligned with the way that Kubernetes itself is designed. The combination of Custom Resources and Custom Controllers are often referred to as an (Kubernetes) Operator. The key use case for Operators are to capture the aim of a human operator who is managing a service or set of services and to implement them using automation, and with a declarative API supporting this automation. Human operators who look after specific applications and services have deep knowledge of how the system ought to behave, how to deploy it, and how to react if there are problems. Examples of problems solved by Operators include taking and restoring backups of that application's state, and handling upgrades of the application code alongside related changes such as database schemas or extra configuration settings.