登录查看更多内容

Kubernetes: Some lesser-known concepts which were really a deal-breaker in the production environment

Vaibhav Jain

Director DevOps & Cloud | Almosafer - Seera Group

发布日期: 2020年3月22日

+ 关注

Kubernetes 1.18 is about to be released!

The paradigm shift towards microservices and containers makes Kubernetes a necessity for every company that wants to compete with fast-moving technology.

We all are working with Kubernetes, few of us are running it with its potential.

I started my journey with Kubernetes in December 2016 when it was version 1.5. Things have moved way faster than I expected.

In this blog, I won't go through the basic objects that are essential building blocks for running Kubernetes and recommended by thousands of blogs over the internet, under Kubernetes best practices.

I am writing this blog to discuss some of the lesser-known features of Kubernetes, which I found very helpful for running Kubernetes in the production environment.

These are the features that look most exciting to me:

Init Containers

Like any container in a pod, init containers are used to run some specific set of tasks that are required by the main container. Unlike normal containers, init containers are not a long-running process they just came into the picture to facilitate the functionality of the pod on start.

Use case

In our Prometheus statefulset, we need to change the ownership of the datastore directory, to achieve this we used init container.

initContainers:
  - name: "init-chown-data"
    image: "busybox:1.30"
    imagePullPolicy: "IfNotPresent"
    command: ["chown", "-R", "65534:65534", "/prometheus"]
    volumeMounts:
    - name: prometheus-storage
      mountPath: "/prometheus"

      subPath: ""

Runtime Class

RuntimeClass is a feature for selecting the container runtime configuration. The container runtime configuration is used to run a Pod’s containers.

Some of the available runtime class are:

dockershim
containerd
CRI-O

Use case

It is useful if you have heterogeneous worker nodes(windows/Linux) and you want to place the pods based on the OS so that it can utilize the OS-specific features/configs.

We have labeled our nodes based on the OS, then created RuntimeClass object.

apiVersion: node.k8s.io/v1beta1
kind: RuntimeClass
metadata:
  name: windows-1903
handler: 'docker'
scheduling:
  nodeSelector:
    kubernetes.io/os: 'windows'
    kubernetes.io/arch: 'amd64'
    node.kubernetes.io/windows-build: '10.0.18362'
  tolerations:
  - effect: NoSchedule
    key: windows
    operator: Equal

    value: "true"
 
---


apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  runtimeClassName: windows-1903

Container lifecycle hooks

The hooks enable containers to be aware of events in their management lifecycle and run code implemented in a handler when the corresponding lifecycle hook is executed.

In layman terms, you can perform some set of actions on a pod start or before pod ends.

Kubernetes supports 2 kinds of hooks:

PostStart
PreStop

Use case

We have injected preStop hook to all our pods with the "sleep 3" command. So, during pod termination, the application that is still serving some final requests gets the breathing time and completes the result without giving a 503 status code.

Pod presets

Pod presets are used to inject additional runtime requirements into a Pod at creation time. It helps avoid passing redundant information on every pod definition. It is mainly used to inject some environment variables or volume/volume mounts to a specific set of pods.

Use case

We are injecting a set of environment variables like log_level, log_format to every pod that is running in our cluster despite application type, technology or behavior.

I found an interesting post explaining the pod preset.

Pod topology spread constraints

This feature is a lifesaver in case of zone failure. This basically distributes your pods evenly among different zones (AWS AZs) to achieve high availability and zone fault tolerance. It works based on the node labels and not only limited to zones. We can use custom topology based on the system needs.

Use case

All our Kubernetes nodes were labeled with a set of labels few of them are:

node=<name of the node, mix of node IP>
zone=<name of AZ>
region=<name of region>

A sample pod definition file:

kind: Pod
apiVersion: v1
metadata:
  name: mypod
  labels:
    foo: bar
spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        foo: bar
  containers:
  - name: pause

    image: k8s.gcr.io/pause:3.1

It will basically spread the pods based on the zone label on your cluster nodes. All parameters are configurable based on needs.

I found Kubernetes documentation really helpful explaining topologySpreadConstraints.

PodDisruptionBudget

This is a nice feature where the application needs a specific number of pods to be available all the time without any disruption.

For example, a quorum-based application would like to ensure that the number of replicas running never goes below the number needed to form a quorum.

Use case

Running Kafka and zookeeper on Kubernetes. Zookeeper needs to form a quorum to be available for Kafka broker.

I found an interesting blog explaining PBD.

Priority Class

As the name suggests, using priority class we can define the criticality of the pod in the system and give pods a preference for scheduling.

According to k8s documentation

"When Pod priority is enabled, the scheduler orders pending Pods by their priority and a pending Pod is placed ahead of other pending Pods with lower priority in the scheduling queue. As a result, the higher priority Pod may be scheduled sooner than Pods with a lower priority if its scheduling requirements are met. If such Pod cannot be scheduled, the scheduler will continue and tries to schedule other lower-priority Pods."

By default when you install Kubernetes, it comes with 2 priority class:

system-cluster-critical
system-node-critical

You may use any of these classes in your pod definition file or you can create your own priority class.

Use case

Our critical components like monitoring, logging, tracing all are deployed in system-node-critical class. So that, whenever we face any disruption in the nodes these pods will get the preference in scheduling before any other non-critical pod.

I found a very interesting blog explaining the Priority class in detail.

Limit Range

This feature is useful if you want to restrict the developers/teams to use a certain amount of resources and not to eat up all the available resources.

I prefer to use limit range with namespace quota, where we define the available CPU and MEM limit on a namespace and then set the limit range so that a single pod will not eat all the available resources in that namespace.

apiVersion: v1
kind: LimitRange
metadata:
  name: limit-mem-cpu-per-container
spec:
  limits:
  - max:
      cpu: "80m"
      memory: "100Mi"
    min:
      cpu: "40m"
      memory: "50Mi"
    default:
      cpu: "50m"
      memory: "60Mi"

    type: Container

---

apiVersion: v1
kind: Pod
metadata:
  name: busybox2
spec:
  containers:
  - name: busybox-cnt01
    image: busybox
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo hello from cnt01; sleep 10;done"]
    resources:
      requests:
        memory: "10Mi"
        cpu: "10m"
      limits:
        memory: "20Mi"
        cpu: "50m"
  - name: busybox-cnt02
    image: busybox
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo hello from cnt02; sleep 10;done"]
    resources:
      requests:
        memory: "10Mi"
        cpu: "10m"
  - name: busybox-cnt03
    image: busybox
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo hello from cnt03; sleep 10;done"]
    resources:
      limits:
        memory: "20Mi"
        cpu: "50m"
  - name: busybox-cnt04
    image: busybox
    command: ["/bin/sh"]

    args: ["-c", "while true; do echo hello from cnt04; sleep 10;done"]

When you try to go beyond the limits it will give you an error.

These are the features of Kubernetes which I feel are underrated or lesser-known but are the real deal-breaker.

Apart from these, one more feature “Custom recourse definition” CRDs are super beneficial, I will write a separate post on them.

I hope you guys found some good take away.

Stay motivated, cheers!!

Credits: Kubernetes documentation, medium blogs, and google photos.