Turn off the lights: Scale to Zero with KEDA
At some point, every organization faces the challenge of escalating cloud infrastructure costs. Containerized workloads accumulate in cloud-based Kubernetes clusters, and if not properly configured, they can waste valuable resources such as compute, memory, and storage.
Tools like Kubecost and Opencost can reveal a harsh reality: high bills often stem from leftover deployments that run continuously but are no longer needed. Another common issue arises from the continuous delivery process, where applications and services are deployed across multiple environment stages (e.g., sandbox, development, UAT, staging, production). This results in several copies of the same application, each serving a specific purpose (testing, demonstration, integration, etc.). However, many of these deployments are rarely used and remain idle, incurring significant costs.
Scaling to Zero
A common cost-cutting solution is to undeploy these idle applications or scale them down to the minimum number of containers when their traffic is negligible. However, with the Kubernetes Horizontal Pod Autoscaler (HPA), the minimum number of instances is one, which can still lead to unwanted compute costs.
Consider an enterprise context where some workloads are only used during business hours. Why pay for CPU costs 24/7, including weekends, when you only need them for up to 8 hours a day? Running them all the time can result in wasting up to 75% of compute costs.
There are several ways to scale the number of Pod replicas to zero. For instance, you could schedule a CronJob to patch the Deployment at the end of the workday, setting the replicas to zero, and reset it with another CronJob at the start of the workday. However, this approach may not work well with Infrastructure as Code (IaC) tools like ArgoCD, which would restore the original desired number of instances defined in the configuration. Additionally, HPA cannot scale to zero instances by default based on resource metrics like CPU or memory.
Kubernetes Event-Driven Autoscaling
KEDA (Kubernetes-based Event Driven Autoscaling) is an open-source project that automatically scales application container instances in Kubernetes based on specific events. It is particularly useful for efficiently managing event-driven applications by dynamically scaling resources.
Common use cases include scaling Deployments based on:
Chart developers use KEDA's ScaledObject to define which events should trigger the scaling of a Deployment. These events are supplied to a Horizontal Pod Autoscaler as custom metrics, contributing to the HPA's scaling decisions.
In our scale-to-zero use case, we aim to scale the replica count to zero outside of business hours. We define a cron trigger that activates only during business hours, allowing the deployment to scale down to zero Pod instances otherwise.
Installing KEDA operator
To begin, we need to install the KEDA Operator into our Kubernetes cluster. Follow these steps to do it in a declarative IaC-style:
Create a Directory for the Operator First, create a directory for the KEDA Operator within your project folder:
mkdir -p helm/keda
Create the Chart.yaml File Next, create a Chart.yaml file to specify the operator’s Helm repository and its version as a dependency:
# in helm/keda/Chart.yaml
apiVersion: v2
name: keda_chart
description: A Helm chart with Keda as sub-chart
type: application
version: 0.1.0
appVersion: "0.1.0"
dependencies:
- name: keda
version: "2.14"
repository: https://kedacore.github.io/charts
See the latest versions of KEDA here
Exclude Helm Repo Artifacts from Git To avoid committing Helm repository artifacts to your version control system, create a .gitignore file:
# file: helm/keda/.gitignore
charts/
Download the Chart’s Dependencies Download the chart’s dependencies into the charts directory:
领英推荐
helm dep update helm/keda
Install the KEDA Operator Finally, install the KEDA Operator using Helm:
helm upgrade -i keda helm/keda --namespace keda --create-namespace
As a result, the KEDA operator is deployed to the Kubernetes cluster. From now on, applications can set up event-driven autoscaling using KEDA's custom ScaledObject resource.
Cron Scaler
We will use KEDA's Cron scaler to start the Pods of our Deployment at the beginning of the business hours and stop them at the end.
In our application's Helm chart, let's define the following values to control the Deployment's autoscaling:
# file: values.yaml
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 4
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
keda:
enabled: true
pollingInterval: 15 # how frequently to check the triggers?
idleReplicaCount: 0 # set to 0 to permit "scale to zero" if no events are triggered
cooldownPeriod: 10 # The period to wait after the last trigger reported active before scaling the resource back to zero.
triggers: # defines triggers: when the Pods should be running?
- type: cron
metadata:
timezone: "America/New_York"
start: "0 8 * * 1-5" # every weekday from 8 AM to 8 PM
end: "0 20 * * 1-5"
desiredReplicas: "{{ .Values.autoscaling.minReplicas }}"
KEDA can take ownership of the existing HPA to add custom metrics for scaling. Alternatively, it can also create its own HPA instance.
Let's assume we already have an HPA instance for the Deployment, which we used previously to scale the Deployment's Pods horizontally based on CPU and/or memory utilization metrics:
# file: templates/hpa.yaml
{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: {{ include "my-service.fullname" . }}
labels:
{{- include "my-service.labels" . | nindent 4 }}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ include "my-service.fullname" . }}
minReplicas: {{ .Values.autoscaling.minReplicas }}
maxReplicas: {{ .Values.autoscaling.maxReplicas }}
metrics:
{{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
- type: Resource
resource:
name: cpu
target:
averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
type: Utilization
{{- end }}
{{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
- type: Resource
resource:
name: memory
target:
averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
type: Utilization
{{- end }}
{{- end }}
Then a KEDA ScaledObject configures:
advanced: Configures advanced settings, including taking over an existing HPA.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: {{ include "my-service.fullname" . }}
namespace: {{ .Release.Namespace }}
annotations:
scaledobject.keda.sh/transfer-hpa-ownership: "true"
spec:
scaleTargetRef:
apiVersion: "apps/v1"
kind: "Deployment"
name: {{ include "my-service.fullname" . }}
pollingInterval: {{ .Values.autoscaling.keda.pollingInterval | default "15" }}
minReplicaCount: {{ .Values.autoscaling.minReplicas }}
maxReplicaCount: {{ .Values.autoscaling.maxReplicas }}
idleReplicaCount: {{ .Values.autoscaling.keda.idleReplicaCount }}
cooldownPeriod: {{ .Values.autoscaling.keda.cooldownPeriod | default "300" }}
triggers:
{{- if not (hasKey .Values.autoscaling.keda "triggers") -}}
{{ fail "autoscaling.keda.triggers is not defined in values" }}
{{- else -}}
{{ tpl (.Values.autoscaling.keda.triggers | toYaml) . | nindent 4 }}
{{- end }}
# ---
{{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
- type: cpu
metricType: Utilization
metadata:
value: "{{ .Values.autoscaling.targetCPUUtilizationPercentage }}"
{{- end }}
{{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
- type: memory
metricType: Utilization
metadata:
value: "{{ .Values.autoscaling.targetMemoryUtilizationPercentage }}"
{{- end }}
advanced:
horizontalPodAutoscalerConfig:
name: {{ include "my-service.fullname" . }}
{{- end }}
As a result of the above configuration the Deployment's HPA will be modified with an external metric to permit scaling the deployment to zero instance during outside of the business hours. Within the business hours the usual CPU and memory-utilization will be considered to scale to the necessary number of Pod instances to serve the actual traffic.
Notes on ArgoCD
In the above example KEDA takes ownership of an already existing HPA. If the service is deployed with ArgoCD, then ArgoCD will neverendingly try to restore the HPA's original state (such as the metadata, metrics or the labels fields). To avoid this, the following ignoreDifferences configuration will help:
# the ArgoCD Application resource yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
...
ignoreDifferences:
- group: "autoscaling"
kind: "HorizontalPodAutoscaler"
managedFieldsManagers:
- "keda" # ignore differences of all fields owned by Keda
jsonPointers:
- /metadata/annotations
- /metadata/labels
Conclusion
In this article, we explored the challenges of managing cloud infrastructure costs and how idle Kubernetes deployments can contribute to unnecessary expenses. We introduced KEDA (Kubernetes-based Event Driven Autoscaling) as a solution to dynamically scale containerized applications based on specific events, including scaling to zero during off-peak hours.
By installing the KEDA Operator and configuring a ScaledObject with a cron trigger, we demonstrated how to efficiently manage resources and reduce costs. Implementing KEDA's event-driven autoscaling capabilities allows organizations to optimize their Kubernetes deployments, ensuring that resources are used only when needed, ultimately leading to significant cost savings.