Kubernetes: Some lesser-known concepts which were really a deal-breaker in the production environment
Kubernetes 1.18 is about to be released!
The paradigm shift towards microservices and containers makes Kubernetes a necessity for every company that wants to compete with fast-moving technology.
We all are working with Kubernetes, few of us are running it with its potential.
I started my journey with Kubernetes in December 2016 when it was version 1.5. Things have moved way faster than I expected.
In this blog, I won't go through the basic objects that are essential building blocks for running Kubernetes and recommended by thousands of blogs over the internet, under Kubernetes best practices.
I am writing this blog to discuss some of the lesser-known features of Kubernetes, which I found very helpful for running Kubernetes in the production environment.
These are the features that look most exciting to me:
Init Containers
Like any container in a pod, init containers are used to run some specific set of tasks that are required by the main container. Unlike normal containers, init containers are not a long-running process they just came into the picture to facilitate the functionality of the pod on start.
Use case
In our Prometheus statefulset, we need to change the ownership of the datastore directory, to achieve this we used init container.
initContainers: - name: "init-chown-data" image: "busybox:1.30" imagePullPolicy: "IfNotPresent" command: ["chown", "-R", "65534:65534", "/prometheus"] volumeMounts: - name: prometheus-storage mountPath: "/prometheus" subPath: ""
Runtime Class
RuntimeClass is a feature for selecting the container runtime configuration. The container runtime configuration is used to run a Pod’s containers.
Some of the available runtime class are:
- dockershim
- containerd
- CRI-O
Use case
It is useful if you have heterogeneous worker nodes(windows/Linux) and you want to place the pods based on the OS so that it can utilize the OS-specific features/configs.
We have labeled our nodes based on the OS, then created RuntimeClass object.
apiVersion: node.k8s.io/v1beta1 kind: RuntimeClass metadata: name: windows-1903 handler: 'docker' scheduling: nodeSelector: kubernetes.io/os: 'windows' kubernetes.io/arch: 'amd64' node.kubernetes.io/windows-build: '10.0.18362' tolerations: - effect: NoSchedule key: windows operator: Equal value: "true" --- apiVersion: v1 kind: Pod metadata: name: mypod spec: runtimeClassName: windows-1903
Container lifecycle hooks
The hooks enable containers to be aware of events in their management lifecycle and run code implemented in a handler when the corresponding lifecycle hook is executed.
In layman terms, you can perform some set of actions on a pod start or before pod ends.
Kubernetes supports 2 kinds of hooks:
- PostStart
- PreStop
Use case
We have injected preStop hook to all our pods with the "sleep 3" command. So, during pod termination, the application that is still serving some final requests gets the breathing time and completes the result without giving a 503 status code.
Pod presets
Pod presets are used to inject additional runtime requirements into a Pod at creation time. It helps avoid passing redundant information on every pod definition. It is mainly used to inject some environment variables or volume/volume mounts to a specific set of pods.
Use case
We are injecting a set of environment variables like log_level, log_format to every pod that is running in our cluster despite application type, technology or behavior.
I found an interesting post explaining the pod preset.
Pod topology spread constraints
This feature is a lifesaver in case of zone failure. This basically distributes your pods evenly among different zones (AWS AZs) to achieve high availability and zone fault tolerance. It works based on the node labels and not only limited to zones. We can use custom topology based on the system needs.
Use case
All our Kubernetes nodes were labeled with a set of labels few of them are:
- node=<name of the node, mix of node IP>
- zone=<name of AZ>
- region=<name of region>
A sample pod definition file:
kind: Pod apiVersion: v1 metadata: name: mypod labels: foo: bar spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: zone whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: foo: bar containers: - name: pause image: k8s.gcr.io/pause:3.1
It will basically spread the pods based on the zone label on your cluster nodes. All parameters are configurable based on needs.
I found Kubernetes documentation really helpful explaining topologySpreadConstraints.
PodDisruptionBudget
This is a nice feature where the application needs a specific number of pods to be available all the time without any disruption.
For example, a quorum-based application would like to ensure that the number of replicas running never goes below the number needed to form a quorum.
Use case
Running Kafka and zookeeper on Kubernetes. Zookeeper needs to form a quorum to be available for Kafka broker.
I found an interesting blog explaining PBD.
Priority Class
As the name suggests, using priority class we can define the criticality of the pod in the system and give pods a preference for scheduling.
According to k8s documentation
"When Pod priority is enabled, the scheduler orders pending Pods by their priority and a pending Pod is placed ahead of other pending Pods with lower priority in the scheduling queue. As a result, the higher priority Pod may be scheduled sooner than Pods with a lower priority if its scheduling requirements are met. If such Pod cannot be scheduled, the scheduler will continue and tries to schedule other lower-priority Pods."
By default when you install Kubernetes, it comes with 2 priority class:
- system-cluster-critical
- system-node-critical
You may use any of these classes in your pod definition file or you can create your own priority class.
Use case
Our critical components like monitoring, logging, tracing all are deployed in system-node-critical class. So that, whenever we face any disruption in the nodes these pods will get the preference in scheduling before any other non-critical pod.
I found a very interesting blog explaining the Priority class in detail.
Limit Range
This feature is useful if you want to restrict the developers/teams to use a certain amount of resources and not to eat up all the available resources.
I prefer to use limit range with namespace quota, where we define the available CPU and MEM limit on a namespace and then set the limit range so that a single pod will not eat all the available resources in that namespace.
apiVersion: v1 kind: LimitRange metadata: name: limit-mem-cpu-per-container spec: limits: - max: cpu: "80m" memory: "100Mi" min: cpu: "40m" memory: "50Mi" default: cpu: "50m" memory: "60Mi" type: Container --- apiVersion: v1 kind: Pod metadata: name: busybox2 spec: containers: - name: busybox-cnt01 image: busybox command: ["/bin/sh"] args: ["-c", "while true; do echo hello from cnt01; sleep 10;done"] resources: requests: memory: "10Mi" cpu: "10m" limits: memory: "20Mi" cpu: "50m" - name: busybox-cnt02 image: busybox command: ["/bin/sh"] args: ["-c", "while true; do echo hello from cnt02; sleep 10;done"] resources: requests: memory: "10Mi" cpu: "10m" - name: busybox-cnt03 image: busybox command: ["/bin/sh"] args: ["-c", "while true; do echo hello from cnt03; sleep 10;done"] resources: limits: memory: "20Mi" cpu: "50m" - name: busybox-cnt04 image: busybox command: ["/bin/sh"] args: ["-c", "while true; do echo hello from cnt04; sleep 10;done"]
When you try to go beyond the limits it will give you an error.
These are the features of Kubernetes which I feel are underrated or lesser-known but are the real deal-breaker.
Apart from these, one more feature “Custom recourse definition” CRDs are super beneficial, I will write a separate post on them.
I hope you guys found some good take away.
Stay motivated, cheers!!
Credits: Kubernetes documentation, medium blogs, and google photos.
Senior Software Engineer | Search at noon | Master of Science in Computer Science (NLP) | Google Cloud Certified Machine Learning Engineer
4 年Nice one my friend
Site Reliability Engineer
4 年Very Nice Article with example.?
?? Architect of Digital Transformation | Empowering Next-Gen Operations ??| Senior Platform Engineer @ Walmart| Mentor and Coach
4 年Nicely explained each and every concept Vaibhav Jain
DevOps Engineer
4 年Great effort! You explained it very well even for beginners ??
I like your style to give concrete examples after the explanation.