Setting up a Horizontal Pod Autoscaler for Kubernetes cluster
In a docker-based micro-service setup, the application does not automatically scale based on the number of users accessing it during business hours.
To resolve this, the team at CloudifyOps suggested implementing a Horizontal Pod Autoscaler (HPA) in the Kubernetes environment. With this approach, we can scale the application pod based on CPU and memory utilization. It further reduces the need to run more pods (more replicas) during non business hours.
Introduction:
Kubernetes autoscaling: The three scalability tools that Kubernetes has are the Horizontal pod autoscaler, Vertical pod autoscaler (VPA) and the cluster autoscaler. HPA and VPA tools are used to scale up and monitor the application layer.
Horizontal pod autoscaling: When a spike or drop in consumption occurs, Kubernetes can automatically decrease or increase the number of pods that serve the workload.
Vertical pod autoscaling: Deciding how much compute resources to dedicate to a particular workload is challenging. With the right configuration, Kubernetes can help you get the most out of the allocated resources.
Requirement:
Steps to follow:
Installing the metrics-server: The goal of the HPA is to make scaling decisions based on the per-pod resource metrics that are retrieved from the metrics API (metrics.k8s.io).
Create the cluster without giving the --yes argument to it. This will only create the configuration. Now, we need to make the below changes to the metrics server configuration.
The metrics server will be useful in creating the HPA.?
For Cluster created with KOPS, follow the steps:
add the below configuration to your cluster configuration under kubelet
kubelet:
????anonymousAuth: false
????authorizationMode: Webhook
????authenticationTokenWebhook: true
After making the changes, we should update the cluster. With the below command, the cluster will be created with the required configuration.
If you are changing the configuration after deploying the cluster, you need to run the rolling-update, which causes the master to terminate and redeploy. Later, new nodes will be deployed and the old nodes will be terminated.
To avoid this recreation, we are following the above steps while creating the Kubernetes cluster with kops (Kubernetes operations).
Now we need to install the metrics server
Below is the output of the metrics server creation step.
To confirm the metrics server installation?
You will find metrics server pod in the pods list.
We can find out the memory and CPUutilization of pods and nodes using below commands?
The output should look like this.
Resource Requests and limits:
If the resource limit of a pod is exceeded, then it can use more than its requested resource. However, a container can't use more than its resource limit.
领英推荐
If you set a memory request for 256 MiB, and a container is in a scheduled pod, then it can use more RAM.
If the limit is set at 4GiB, the kubelet enforces the limit. The runtime stops the process that tries to consume more than the permitted amount of memory.
Configuring HPA:
It is important that we have resource requests and limits mentioned in the container resources like shown in the above image.
First, we will start a deployment running the image and expose it as a service using the following command?
One new deployment and service will be created with the above command. After completing the deployment, we need to deploy the HPA.
Follow below commands:
echo ‘apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
??name: php-apache
spec:
??scaleTargetRef:
????apiVersion: apps/v1
????kind: Deployment
????name: php-apache
??minReplicas: 1
??maxReplicas: 10
??targetCPUUtilizationPercentage: 50 ‘ | kubectl apply -f -?
The HPA will be deployed as shown here.
As there is no load applied on the deployment, the targets show 0%/50%. To test the HPA, we shall apply load on the deployment.
Run the above command to apply some load on the deployment. You will get an output like below.
The deployment is scaled up. If you see in the below image, when the load increases above? 50%, the deployment scaled up to 7 pods.
The default time to scale down is 300 seconds. The scale down time can be customized to suit different requirements.
Note: If you use AWS EKS, the metric server needs to be enabled with the following command.
With the above exercise, we applied load on the CPU. Similarly, we can configure for memory usage as well.
To learn more about these cutting edge technologies & real time industry applied best practices, follow our LinkedIn Page. To explore our services, visit our website.
Reference links: