What you should know about autoscaling in Kubernetes

What you should know about autoscaling in Kubernetes

What you should know about autoscaling in Kubernetes

What is autoscaling?

Autoscaling is one of the reasons I fell in love with Kubernetes. Kubernetes is built to scale workloads with ease. But what is autoscaling?

Imagine having a party in your house and having all your family and friends attend. If you are from Nigeria, that number can easily get to 500. So you need approximately 500 chairs, 500 cutlery sets, glasses, 50 tables, and almost everything. So you go to the market and buy all these. On the “D” day as we call it, you find that only 200 people came, you have just wasted money on the stuff you bought for the extra 300. Or imagine you get 1000 people instead, then you don’t have enough to take care of that number.

A few days to the “D” day, you find that there is an event vendor close by from whom you can rent all these things. He asks you to pay for only the things you use. So you can take 200, and if your visitors exceed 200, you can rent more to make up for it. If the number is less, then you can return the ones you do not need, and you won’t need to pay for them. So you have saved yourself the disappointment of not having enough resources to cater to your guests, and the cost of buying or renting more than you require. This example illustrates the basic concept of scaling.

Scaling is the ability of a system to increase or decrease in size. Autoscaling is the ability of a system to scale based on resource demand or pressure without manual interference.

Cloud computing makes scaling very easy, and cost-effective. Before cloud computing, if you wanted to scale your resources, you would either need to buy physical memory, processors, or machines. Then you had to worry about space and maintenance. With cloud computing, you can scale your resources in seconds. With Kubernetes, it’s like magic.

We will look at 3 types of autoscaling

  1. Vertical autoscaling (VPA)
  2. Horizontal autoscaling (HPA)
  3. Cluster autoscaling.

Vertical autoscaling (VPA)

In vertical autoscaling, we either increase or decrease the amount of CPU or memory assigned to a pod automatically.

A vertical autoscaler monitors the resource usage of a pod and recommends or updates the limits and requests of the resources based on the configuration passed. For example, if a pod is requesting 200M CPU and 500Mi memory and is using up to 350M CPU and 400Mi memory, the VPA can increase the requests to 500m CPU and 700M memory based on the configuration you pass.

Horizontal Pod Autoscaler (HPA)

A horizontal pod autoscaler automatically increases or decreases the number of pods of a deployment based on defined configurations.

The HPA monitors metrics such as CPU, Memory, and other custom metrics, and scales the number of pods based on a predefined threshold. For example, if the CPU usage of a pod exceeds 50%, the HPA may add more pods to distribute the load.

Cluster Autoscaler

The cluster autoscaler automatically increases or reduces the number of nodes running in a cluster based on the resource usage from the pods in the node.

One way the cluster autoscaler is triggered is the presence of unschedulable pods. Unschedulable pods are pods that do not meet the necessary conditions for them to be placed in any node in the cluster. But what could make a pod unschedulable?

There are several reasons why a pod could be unschedulable, but I will just mention a few:

  1. resource constraint: a pod could be unschedulable if the available CPU or memory in the node is not enough based on the demands of the resources of a pod. For example, a node has 30Gi memory, and after other pods have been scheduled, it is left with 1Gi memory. A new pod with a 2Gi memory request will be unschedulable on that node.
  2. Afinity rules: These are rules that determine whether a pod can be scheduled on a particular node or not: i. Node affinity rules are rules that specify that a pod must be scheduled on a node with certain labels ii. Node anti-affinity rules that specify that a pod cannot be scheduled on a node with certain labels. iii. Pod affinity rules specify that a pod should be scheduled on a node with another pod. iv. Pod anti-affinity rules specify that a pod should not be scheduled on a node like another pod.

Other things could make a pod unschedulable like taints and tolerations.

The cluster autoscaler checks if scheduling a new node will make the unschedulable pods schedulable again, and if it will, it creates a new node. Also, if the autoscaler notices nodes with free space that can contain other pods in another node and free the node, it moves the pods to the other nodes and then deletes the node it has freed.

What are the benefits of autoscaling?

  • Cost saving: you only pay for what you use. If a node is no longer in use, it is deleted, if the utilisation of resources is low, excess pods are deleted. This saves money.
  • Minimized downtimes: Because your resources are automatically scaled, you won’t have a situation where your services crash due to pressure. Once a certain threshold is reached, the autoscaler will spin more resources to prevent failures.
  • Proper resource management: When deploying services to your cluster, you do not have to worry about specifying the requests and limits. The autoscaler can help you determine those based on accurate observation of resource usage.

What should you watch out for when setting up autoscaling?

  • It is best practice to specify your resource limits and requests. Autoscalers work with metrics.
  • There may be an issue of a race condition between ArgoCD and your VPA. ArgoCD will constantly try to update your resource limits and request to match what you have in your code repository while the VPA will try to update it based on resource usage. This problem can be solved by using a feature in ArgoCD that helps you ignore changes in certain parts of your resource template. You can see the example below.

ignoreDifferences:
    - group: ""
      kind: Deployment
      jsonPointers:
        - /spec/template/spec/containers/0/resources/requests
        - /spec/template/spec/containers/0/resources/limits        

  • If you are using GitOps and maybe ArgoCD to deploy your workloads, make sure to omit the replicas field in your deployments to avoid a conflict between your deployment template and the HPA. This is because ArgoCD will constantly update your deployment spec to match what you have on your code repository. This will reverse the HPA’s update. Omitting the replicas field allows the HPA to manage the number of replicas.
  • It is best practice not to use the HPA and VPA together as this may cause some conflict. So if you must use them together, ensure you know what you are doing. To get a better understanding of what the problem might be:? Imagine you set your HPA to scale the number of pods when total memory utilization hits 85% and your VPA to scale when memory utilization hits 85%. What happens when the VPA attempts to increase the resources for the pod, and then the HPA tries to create more replicas? Well, whichever autoscaler makes the change first influences the overall utilisation. This will cause the other autoscaler, let’s say the VPA realizes that the pods are not using the resources allocated to them and then decides to change the pods to adjust them. The cycle continues with each autoscaler trying to fix the problem and the other reversing it and causing the problem to continue.

Conclusion

Autoscaling in Kubernetes is a very powerful feature. It helps you manage your server resources efficiently. The different methods of autoscaling provide us with rich options for ensuring that our workloads are managed efficiently without manual intervention.

While autoscaling can be very useful for managing resources, it is important to implement it correctly. the autoscalers should be configured properly, with an understanding of potential issues while following best practices.

Please let leave your thoughts in the comment section. I want to hear from you.

要查看或添加评论,请登录

David Essien的更多文章

社区洞察

其他会员也浏览了