Turn off the lights: Scale to Zero with KEDA

Turn off the lights: Scale to Zero with KEDA

At some point, every organization faces the challenge of escalating cloud infrastructure costs. Containerized workloads accumulate in cloud-based Kubernetes clusters, and if not properly configured, they can waste valuable resources such as compute, memory, and storage.

Tools like Kubecost and Opencost can reveal a harsh reality: high bills often stem from leftover deployments that run continuously but are no longer needed. Another common issue arises from the continuous delivery process, where applications and services are deployed across multiple environment stages (e.g., sandbox, development, UAT, staging, production). This results in several copies of the same application, each serving a specific purpose (testing, demonstration, integration, etc.). However, many of these deployments are rarely used and remain idle, incurring significant costs.

Scaling to Zero

A common cost-cutting solution is to undeploy these idle applications or scale them down to the minimum number of containers when their traffic is negligible. However, with the Kubernetes Horizontal Pod Autoscaler (HPA), the minimum number of instances is one, which can still lead to unwanted compute costs.

Consider an enterprise context where some workloads are only used during business hours. Why pay for CPU costs 24/7, including weekends, when you only need them for up to 8 hours a day? Running them all the time can result in wasting up to 75% of compute costs.

There are several ways to scale the number of Pod replicas to zero. For instance, you could schedule a CronJob to patch the Deployment at the end of the workday, setting the replicas to zero, and reset it with another CronJob at the start of the workday. However, this approach may not work well with Infrastructure as Code (IaC) tools like ArgoCD, which would restore the original desired number of instances defined in the configuration. Additionally, HPA cannot scale to zero instances by default based on resource metrics like CPU or memory.

Kubernetes Event-Driven Autoscaling

KEDA (Kubernetes-based Event Driven Autoscaling) is an open-source project that automatically scales application container instances in Kubernetes based on specific events. It is particularly useful for efficiently managing event-driven applications by dynamically scaling resources.

Common use cases include scaling Deployments based on:

  • The number of unprocessed messages in a queue or topic: Start more consumers to meet increasing demand and stop consumers when the load decreases.
  • Comparing the result of a database query with a predetermined target value.
  • Pods' resource utilization metrics (CPU, memory): Scale out if utilization surpasses a certain threshold and scale in if the load decreases.
  • Scaling instances based on the time of day.

Chart developers use KEDA's ScaledObject to define which events should trigger the scaling of a Deployment. These events are supplied to a Horizontal Pod Autoscaler as custom metrics, contributing to the HPA's scaling decisions.

Logical view of KEDA and the interaction with the scaled application

In our scale-to-zero use case, we aim to scale the replica count to zero outside of business hours. We define a cron trigger that activates only during business hours, allowing the deployment to scale down to zero Pod instances otherwise.

Installing KEDA operator

To begin, we need to install the KEDA Operator into our Kubernetes cluster. Follow these steps to do it in a declarative IaC-style:

Create a Directory for the Operator First, create a directory for the KEDA Operator within your project folder:

mkdir -p helm/keda        

Create the Chart.yaml File Next, create a Chart.yaml file to specify the operator’s Helm repository and its version as a dependency:

# in helm/keda/Chart.yaml

apiVersion: v2
name: keda_chart
description: A Helm chart with Keda as sub-chart
type: application
version: 0.1.0
appVersion: "0.1.0"

dependencies:
  - name: keda
    version: "2.14"
    repository: https://kedacore.github.io/charts        

See the latest versions of KEDA here

Exclude Helm Repo Artifacts from Git To avoid committing Helm repository artifacts to your version control system, create a .gitignore file:

# file: helm/keda/.gitignore
charts/        

Download the Chart’s Dependencies Download the chart’s dependencies into the charts directory:

helm dep update helm/keda        

Install the KEDA Operator Finally, install the KEDA Operator using Helm:

helm upgrade -i keda helm/keda --namespace keda --create-namespace        

As a result, the KEDA operator is deployed to the Kubernetes cluster. From now on, applications can set up event-driven autoscaling using KEDA's custom ScaledObject resource.

Cron Scaler

We will use KEDA's Cron scaler to start the Pods of our Deployment at the beginning of the business hours and stop them at the end.

In our application's Helm chart, let's define the following values to control the Deployment's autoscaling:

# file: values.yaml

autoscaling:
  enabled: true  
  minReplicas: 2
  maxReplicas: 4
  targetCPUUtilizationPercentage: 80
  targetMemoryUtilizationPercentage: 80

  keda: 
    enabled: true
    pollingInterval: 15       # how frequently to check the triggers?
    idleReplicaCount: 0     # set to 0 to permit "scale to zero" if no events are triggered
    cooldownPeriod: 10    # The period to wait after the last trigger reported active before scaling the resource back to zero.

    triggers:             # defines triggers: when the Pods should be running?
      - type: cron
        metadata:      
          timezone: "America/New_York"
          start: "0 8 * *  1-5"                        # every weekday from 8 AM to 8 PM 
          end: "0 20 * * 1-5"
          desiredReplicas: "{{ .Values.autoscaling.minReplicas }}"
        

KEDA can take ownership of the existing HPA to add custom metrics for scaling. Alternatively, it can also create its own HPA instance.

Let's assume we already have an HPA instance for the Deployment, which we used previously to scale the Deployment's Pods horizontally based on CPU and/or memory utilization metrics:

# file: templates/hpa.yaml

{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2

kind: HorizontalPodAutoscaler
metadata:
  name: {{ include "my-service.fullname" . }}
  labels:
    {{- include "my-service.labels" . | nindent 4 }}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ include "my-service.fullname" . }}
  minReplicas: {{ .Values.autoscaling.minReplicas }}
  maxReplicas: {{ .Values.autoscaling.maxReplicas }}

  metrics:
    {{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
    - type: Resource
      resource:
        name: cpu
        target:
          averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
          type: Utilization
    {{- end }}

    {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
    - type: Resource
      resource:
        name: memory
        target:
          averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
          type: Utilization
    {{- end }}
{{- end }}        

Then a KEDA ScaledObject configures:

  • scaleTargetRef: Points to the deployment to be scaled.
  • pollingInterval: Defines how often KEDA checks the metrics (in seconds).
  • minReplicaCount and maxReplicaCount: Set the minimum and maximum number of replicas.
  • idleReplicaCount: Specifies the number of replicas to maintain when there are no events.
  • cooldownPeriod: Time to wait before scaling down (in seconds).
  • triggers: Defines the scaling triggers, such as CPU and memory utilization.

advanced: Configures advanced settings, including taking over an existing HPA.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: {{ include "my-service.fullname" . }}
  namespace: {{ .Release.Namespace }}
  annotations:
    scaledobject.keda.sh/transfer-hpa-ownership: "true"

spec:
  scaleTargetRef:
    apiVersion: "apps/v1"
    kind: "Deployment"
    name: {{ include "my-service.fullname" . }}

  pollingInterval: {{ .Values.autoscaling.keda.pollingInterval | default "15" }}  
  minReplicaCount: {{ .Values.autoscaling.minReplicas }}
  maxReplicaCount: {{ .Values.autoscaling.maxReplicas }}
  idleReplicaCount: {{ .Values.autoscaling.keda.idleReplicaCount }} 
  cooldownPeriod:  {{ .Values.autoscaling.keda.cooldownPeriod  | default "300" }} 

  triggers:
{{- if not (hasKey .Values.autoscaling.keda "triggers") -}}
  {{ fail "autoscaling.keda.triggers is not defined in values" }}
{{- else -}}
    {{ tpl (.Values.autoscaling.keda.triggers | toYaml) . | nindent 4 }}
{{- end }}    
    # ---
    {{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
    - type: cpu 
      metricType: Utilization 
      metadata:
        value: "{{ .Values.autoscaling.targetCPUUtilizationPercentage }}"
    {{- end }}
    {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
    - type: memory
      metricType: Utilization 
      metadata:
        value: "{{ .Values.autoscaling.targetMemoryUtilizationPercentage }}"
    {{- end }}

  advanced:
      horizontalPodAutoscalerConfig:
        name: {{ include "my-service.fullname" . }} 
{{- end }}        

As a result of the above configuration the Deployment's HPA will be modified with an external metric to permit scaling the deployment to zero instance during outside of the business hours. Within the business hours the usual CPU and memory-utilization will be considered to scale to the necessary number of Pod instances to serve the actual traffic.

Notes on ArgoCD

In the above example KEDA takes ownership of an already existing HPA. If the service is deployed with ArgoCD, then ArgoCD will neverendingly try to restore the HPA's original state (such as the metadata, metrics or the labels fields). To avoid this, the following ignoreDifferences configuration will help:

# the ArgoCD Application resource yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
...

ignoreDifferences:
    - group: "autoscaling"
      kind: "HorizontalPodAutoscaler"
      managedFieldsManagers:
      - "keda"        # ignore differences of all fields owned by Keda
      jsonPointers:
        - /metadata/annotations
        - /metadata/labels        

Conclusion

In this article, we explored the challenges of managing cloud infrastructure costs and how idle Kubernetes deployments can contribute to unnecessary expenses. We introduced KEDA (Kubernetes-based Event Driven Autoscaling) as a solution to dynamically scale containerized applications based on specific events, including scaling to zero during off-peak hours.

By installing the KEDA Operator and configuring a ScaledObject with a cron trigger, we demonstrated how to efficiently manage resources and reduce costs. Implementing KEDA's event-driven autoscaling capabilities allows organizations to optimize their Kubernetes deployments, ensuring that resources are used only when needed, ultimately leading to significant cost savings.



要查看或添加评论,请登录

Richard Pal的更多文章

社区洞察

其他会员也浏览了