登录查看更多内容

Turn off the lights: Scale to Zero with KEDA

Richard Pal

DevOps Engineer at EPAM Systems

发布日期: 2025年2月3日

At some point, every organization faces the challenge of escalating cloud infrastructure costs. Containerized workloads accumulate in cloud-based Kubernetes clusters, and if not properly configured, they can waste valuable resources such as compute, memory, and storage.

Tools like Kubecost and Opencost can reveal a harsh reality: high bills often stem from leftover deployments that run continuously but are no longer needed. Another common issue arises from the continuous delivery process, where applications and services are deployed across multiple environment stages (e.g., sandbox, development, UAT, staging, production). This results in several copies of the same application, each serving a specific purpose (testing, demonstration, integration, etc.). However, many of these deployments are rarely used and remain idle, incurring significant costs.

Scaling to Zero

A common cost-cutting solution is to undeploy these idle applications or scale them down to the minimum number of containers when their traffic is negligible. However, with the Kubernetes Horizontal Pod Autoscaler (HPA), the minimum number of instances is one, which can still lead to unwanted compute costs.

Consider an enterprise context where some workloads are only used during business hours. Why pay for CPU costs 24/7, including weekends, when you only need them for up to 8 hours a day? Running them all the time can result in wasting up to 75% of compute costs.

There are several ways to scale the number of Pod replicas to zero. For instance, you could schedule a CronJob to patch the Deployment at the end of the workday, setting the replicas to zero, and reset it with another CronJob at the start of the workday. However, this approach may not work well with Infrastructure as Code (IaC) tools like ArgoCD, which would restore the original desired number of instances defined in the configuration. Additionally, HPA cannot scale to zero instances by default based on resource metrics like CPU or memory.

Kubernetes Event-Driven Autoscaling

KEDA (Kubernetes-based Event Driven Autoscaling) is an open-source project that automatically scales application container instances in Kubernetes based on specific events. It is particularly useful for efficiently managing event-driven applications by dynamically scaling resources.

Common use cases include scaling Deployments based on:

The number of unprocessed messages in a queue or topic: Start more consumers to meet increasing demand and stop consumers when the load decreases.
Comparing the result of a database query with a predetermined target value.
Pods' resource utilization metrics (CPU, memory): Scale out if utilization surpasses a certain threshold and scale in if the load decreases.
Scaling instances based on the time of day.

Chart developers use KEDA's ScaledObject to define which events should trigger the scaling of a Deployment. These events are supplied to a Horizontal Pod Autoscaler as custom metrics, contributing to the HPA's scaling decisions.

Logical view of KEDA and the interaction with the scaled application

In our scale-to-zero use case, we aim to scale the replica count to zero outside of business hours. We define a cron trigger that activates only during business hours, allowing the deployment to scale down to zero Pod instances otherwise.

Installing KEDA operator

To begin, we need to install the KEDA Operator into our Kubernetes cluster. Follow these steps to do it in a declarative IaC-style:

Create a Directory for the Operator First, create a directory for the KEDA Operator within your project folder:

mkdir -p helm/keda

Create the Chart.yaml File Next, create a Chart.yaml file to specify the operator’s Helm repository and its version as a dependency:

# in helm/keda/Chart.yaml

apiVersion: v2
name: keda_chart
description: A Helm chart with Keda as sub-chart
type: application
version: 0.1.0
appVersion: "0.1.0"

dependencies:
  - name: keda
    version: "2.14"
    repository: https://kedacore.github.io/charts

See the latest versions of KEDA here

Exclude Helm Repo Artifacts from Git To avoid committing Helm repository artifacts to your version control system, create a .gitignore file:

# file: helm/keda/.gitignore
charts/

Download the Chart’s Dependencies Download the chart’s dependencies into the charts directory:

领英推荐

AWS Serverless Computing: Ultimate Guide to Cloud…

HabileLabs 1 个月前

What is FaaS?

Decipher Zone Technologies Pvt Ltd 2 年前

Serverless vs. Server: Which One to Choose

IPSpecialist 2 年前

helm dep update helm/keda

Install the KEDA Operator Finally, install the KEDA Operator using Helm:

helm upgrade -i keda helm/keda --namespace keda --create-namespace

As a result, the KEDA operator is deployed to the Kubernetes cluster. From now on, applications can set up event-driven autoscaling using KEDA's custom ScaledObject resource.

Cron Scaler

We will use KEDA's Cron scaler to start the Pods of our Deployment at the beginning of the business hours and stop them at the end.

In our application's Helm chart, let's define the following values to control the Deployment's autoscaling:

# file: values.yaml

autoscaling:
  enabled: true  
  minReplicas: 2
  maxReplicas: 4
  targetCPUUtilizationPercentage: 80
  targetMemoryUtilizationPercentage: 80

  keda: 
    enabled: true
    pollingInterval: 15       # how frequently to check the triggers?
    idleReplicaCount: 0     # set to 0 to permit "scale to zero" if no events are triggered
    cooldownPeriod: 10    # The period to wait after the last trigger reported active before scaling the resource back to zero.

    triggers:             # defines triggers: when the Pods should be running?
      - type: cron
        metadata:      
          timezone: "America/New_York"
          start: "0 8 * *  1-5"                        # every weekday from 8 AM to 8 PM 
          end: "0 20 * * 1-5"
          desiredReplicas: "{{ .Values.autoscaling.minReplicas }}"

KEDA can take ownership of the existing HPA to add custom metrics for scaling. Alternatively, it can also create its own HPA instance.

Let's assume we already have an HPA instance for the Deployment, which we used previously to scale the Deployment's Pods horizontally based on CPU and/or memory utilization metrics:

# file: templates/hpa.yaml

{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2

kind: HorizontalPodAutoscaler
metadata:
  name: {{ include "my-service.fullname" . }}
  labels:
    {{- include "my-service.labels" . | nindent 4 }}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ include "my-service.fullname" . }}
  minReplicas: {{ .Values.autoscaling.minReplicas }}
  maxReplicas: {{ .Values.autoscaling.maxReplicas }}

  metrics:
    {{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
    - type: Resource
      resource:
        name: cpu
        target:
          averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
          type: Utilization
    {{- end }}

    {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
    - type: Resource
      resource:
        name: memory
        target:
          averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
          type: Utilization
    {{- end }}
{{- end }}

Then a KEDA ScaledObject configures:

scaleTargetRef: Points to the deployment to be scaled.
pollingInterval: Defines how often KEDA checks the metrics (in seconds).
minReplicaCount and maxReplicaCount: Set the minimum and maximum number of replicas.
idleReplicaCount: Specifies the number of replicas to maintain when there are no events.
cooldownPeriod: Time to wait before scaling down (in seconds).
triggers: Defines the scaling triggers, such as CPU and memory utilization.

advanced: Configures advanced settings, including taking over an existing HPA.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: {{ include "my-service.fullname" . }}
  namespace: {{ .Release.Namespace }}
  annotations:
    scaledobject.keda.sh/transfer-hpa-ownership: "true"

spec:
  scaleTargetRef:
    apiVersion: "apps/v1"
    kind: "Deployment"
    name: {{ include "my-service.fullname" . }}

  pollingInterval: {{ .Values.autoscaling.keda.pollingInterval | default "15" }}  
  minReplicaCount: {{ .Values.autoscaling.minReplicas }}
  maxReplicaCount: {{ .Values.autoscaling.maxReplicas }}
  idleReplicaCount: {{ .Values.autoscaling.keda.idleReplicaCount }} 
  cooldownPeriod:  {{ .Values.autoscaling.keda.cooldownPeriod  | default "300" }} 

  triggers:
{{- if not (hasKey .Values.autoscaling.keda "triggers") -}}
  {{ fail "autoscaling.keda.triggers is not defined in values" }}
{{- else -}}
    {{ tpl (.Values.autoscaling.keda.triggers | toYaml) . | nindent 4 }}
{{- end }}    
    # ---
    {{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
    - type: cpu 
      metricType: Utilization 
      metadata:
        value: "{{ .Values.autoscaling.targetCPUUtilizationPercentage }}"
    {{- end }}
    {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
    - type: memory
      metricType: Utilization 
      metadata:
        value: "{{ .Values.autoscaling.targetMemoryUtilizationPercentage }}"
    {{- end }}

  advanced:
      horizontalPodAutoscalerConfig:
        name: {{ include "my-service.fullname" . }} 
{{- end }}

As a result of the above configuration the Deployment's HPA will be modified with an external metric to permit scaling the deployment to zero instance during outside of the business hours. Within the business hours the usual CPU and memory-utilization will be considered to scale to the necessary number of Pod instances to serve the actual traffic.

Notes on ArgoCD

In the above example KEDA takes ownership of an already existing HPA. If the service is deployed with ArgoCD, then ArgoCD will neverendingly try to restore the HPA's original state (such as the metadata, metrics or the labels fields). To avoid this, the following ignoreDifferences configuration will help:

# the ArgoCD Application resource yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
...

ignoreDifferences:
    - group: "autoscaling"
      kind: "HorizontalPodAutoscaler"
      managedFieldsManagers:
      - "keda"        # ignore differences of all fields owned by Keda
      jsonPointers:
        - /metadata/annotations
        - /metadata/labels

Conclusion

In this article, we explored the challenges of managing cloud infrastructure costs and how idle Kubernetes deployments can contribute to unnecessary expenses. We introduced KEDA (Kubernetes-based Event Driven Autoscaling) as a solution to dynamically scale containerized applications based on specific events, including scaling to zero during off-peak hours.

By installing the KEDA Operator and configuring a ScaledObject with a cron trigger, we demonstrated how to efficiently manage resources and reduce costs. Implementing KEDA's event-driven autoscaling capabilities allows organizations to optimize their Kubernetes deployments, ensuring that resources are used only when needed, ultimately leading to significant cost savings.

要查看或添加评论，请登录

Richard Pal的更多文章

Aggregate and Visualize Data Streams with Flink, Kafka, Streamlit, and Altair

2025年2月24日

Aggregate and Visualize Data Streams with Flink, Kafka, Streamlit, and Altair

In this article, we will create a foundational infrastructure that helps in developing streaming data pipelines. The…
Deploy and Run AI Models in Your Own Environment

2025年2月10日

Deploy and Run AI Models in Your Own Environment

Running an AI model locally or in a self-managed environment offers several advantages. Firstly, it provides an…
Vulnerability Scanning Throughout the Delivery Pipeline with Snyk

2025年1月27日

Vulnerability Scanning Throughout the Delivery Pipeline with Snyk

In an era marked by security and privacy breaches, mitigating the risks of security vulnerabilities in software…
Terraform: when only API calls can help

2025年1月18日

Terraform: when only API calls can help

Terraform is a popular Infrastructure-as-Code tool that many organizations use to provision resources across various…

1 条评论
Bootstrapping OpenSearch - Part 2. - Indexes and DataStreams

2024年11月11日

Bootstrapping OpenSearch - Part 2. - Indexes and DataStreams

In my previous article we went through the process of creating Helm charts to bootstrap an OpenSearch cluster in a…
Bootstrapping OpenSearch in a Local K8S Environment

2024年11月4日

Bootstrapping OpenSearch in a Local K8S Environment

OpenSearch is a distributed, open-source search and analytics suite derived from Elasticsearch. It is designed to…
Bootstrapping Local K8S Applications with ArgoCD

2024年10月28日

Bootstrapping Local K8S Applications with ArgoCD

What is ArgoCD and What is Its Role in DevOps and GitOps? ArgoCD is a declarative, GitOps continuous delivery tool for…
The Good Neighbourhood: When Pods Talk to Each Other

2024年10月21日

The Good Neighbourhood: When Pods Talk to Each Other

Problem Statement You are containerizing a legacy stateless backend service application that runs distributed across…
Job is Done Right When Reported Right: Leverage K8S Events for Job Status Reporting

2024年10月14日

Job is Done Right When Reported Right: Leverage K8S Events for Job Status Reporting

Kubernetes, the powerful container orchestration platform, provides a variety of resources to manage containerized…
Client-Side Error Handling: "Curl"-ing with Confidence

2024年10月7日

Client-Side Error Handling: "Curl"-ing with Confidence

The command (short for "Client URL"), is a versatile command-line tool used for transferring data to and from a server.…

See all articles

Turn off the lights: Scale to Zero with KEDA

Richard Pal

DevOps Engineer at EPAM Systems

Scaling to Zero

Kubernetes Event-Driven Autoscaling

Installing KEDA operator

领英推荐

Cron Scaler

Notes on ArgoCD

Conclusion

Richard Pal的更多文章

社区洞察

其他会员也浏览了

Serverless cloud computing: simpler, faster, and more cost-effective

Push beyond Terraform's limits: Discover the simplicity and scalability of Fractal Cloud

Azure Kubernetes Service (AKS)

The Rise of Serverless Computing: Benefits and Considerations for Scaling Your Business

The Future of Workloads: Moving from VMs to Cloud-Native

A Multi Cloud Container Infrastructure Platform with VMWare Tanzu:

Building a Resilient Architecture on AWS: A Comprehensive Guide

Serverless Computing : The Future of Cloud Infrastructure

Understanding AWS Well-Architected Framework for Modern Workloads

Serverless and Cloud-Based Services: The Technology of the Future for Modern Businesses

Scaling to Zero

Kubernetes Event-Driven Autoscaling

Installing KEDA operator

领英推荐

Cron Scaler

Notes on ArgoCD

Conclusion

Richard Pal的更多文章

Aggregate and Visualize Data Streams with Flink, Kafka, Streamlit, and Altair

Deploy and Run AI Models in Your Own Environment

Vulnerability Scanning Throughout the Delivery Pipeline with Snyk

Terraform: when only API calls can help

Bootstrapping OpenSearch - Part 2. - Indexes and DataStreams

Bootstrapping OpenSearch in a Local K8S Environment

Bootstrapping Local K8S Applications with ArgoCD

The Good Neighbourhood: When Pods Talk to Each Other

Job is Done Right When Reported Right: Leverage K8S Events for Job Status Reporting

Client-Side Error Handling: "Curl"-ing with Confidence

社区洞察

其他会员也浏览了

Serverless cloud computing: simpler, faster, and more cost-effective

Push beyond Terraform's limits: Discover the simplicity and scalability of Fractal Cloud

Azure Kubernetes Service (AKS)

The Rise of Serverless Computing: Benefits and Considerations for Scaling Your Business

The Future of Workloads: Moving from VMs to Cloud-Native

A Multi Cloud Container Infrastructure Platform with VMWare Tanzu:

Building a Resilient Architecture on AWS: A Comprehensive Guide

Serverless Computing : The Future of Cloud Infrastructure

Understanding AWS Well-Architected Framework for Modern Workloads

Serverless and Cloud-Based Services: The Technology of the Future for Modern Businesses