登录查看更多内容

Kubernetes Auto Scaling: HPA, VPA, and Cluster Autoscaler for Cost Savings & Performance Gains!

Chaitanya Sawant

SDE @ LivLong || CKA | CKAD | KCNA || 3x Kubernetes Certified || Docker || Git || NextJs || Remixjs || Nodejs || Typescript

发布日期: 2025年3月13日

Scalability is a key advantage of Kubernetes, ensuring applications can handle varying workloads efficiently and cost-effectively. But how does Kubernetes scale workloads? ??

Kubernetes provides three types of auto-scaling mechanisms:

? Horizontal Pod Autoscaler (HPA) – Scales the number of pods based on CPU, memory, or custom metrics.

? Vertical Pod Autoscaler (VPA) – Adjusts CPU & memory requests/limits of pods dynamically.

? Cluster Autoscaler (CA) – Scales worker nodes up or down based on the number of unschedulable pods.

Let’s break down each of these in detail! ??

?? 1. Horizontal Pod Autoscaler (HPA)

HPA adjusts the number of running pods in a deployment, replica set, or stateful set based on observed metrics such as:

?? CPU utilization

?? Memory usage

?? Custom metrics (e.g., request count, latency, queue depth)

It helps handle increased traffic loads by adding more pods and reducing costs by removing excess pods when demand decreases.

? How HPA Works:

1?? Collects metrics from the Kubernetes Metrics Server or Prometheus.

2?? Compares observed values with the target threshold.

3?? Increases or decreases the number of replicas accordingly.

?? Example: HPA Based on CPU Utilization

1?? First, ensure that the Metrics Server is installed (required for HPA):

kubectl apply -f <https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml>

2?? Deploy an application with resource requests:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: nginx
        resources:
          requests:
            cpu: "200m"
          limits:
            cpu: "500m"

3?? Create an HPA resource:

kubectl autoscale deployment my-app --cpu-percent=70 --min=2 --max=10

If CPU usage exceeds 70%, HPA will increase pod replicas.
When CPU usage drops, HPA will remove excess pods to optimize costs.

?? When to Use HPA?

? Stateless applications like web servers, API gateways, or microservices.

? Applications with fluctuating traffic, such as e-commerce sites or event-based services.

?? 2. Vertical Pod Autoscaler (VPA)

Unlike HPA, which scales horizontally by adding pods, VPA scales vertically by adjusting CPU & memory requests and limits for a pod.

? How VPA Works:

1?? Observes historical resource usage and current workload patterns.

2?? Suggests optimal resource requests and limits based on the data.

3?? Automatically updates pod configurations (if enabled in Auto mode).

?? Note: VPA requires pod restarts when applying new resource limits, making it less suitable for applications that cannot tolerate downtime.

?? Example: VPA with Auto Mode

1?? Deploy VPA:

kubectl apply -f <https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml>

2?? Define a VPA resource:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"

Off mode → Only provides recommendations (useful for monitoring).
Initial mode → Sets requests/limits when a pod starts.
Auto mode → Continuously adjusts resource allocation.

?? When to Use VPA?

? Stateful applications like databases, ML models, or in-memory caches.

? Workloads with unpredictable resource usage where setting static limits is inefficient.

?? 3. Cluster Autoscaler (CA)

Both HPA and VPA adjust pod behavior, but what happens if there aren’t enough nodes to schedule new pods? That’s where Cluster Autoscaler (CA) helps!

It dynamically adjusts the number of worker nodes in a Kubernetes cluster based on demand.

? How Cluster Autoscaler Works:

1?? Checks if any pods are in a pending state due to insufficient resources.

2?? Requests the cloud provider (AWS, GCP, Azure) to add more worker nodes.

3?? If nodes are underutilized, it removes them to save costs.

?? Example: Enabling Cluster Autoscaler in AWS EKS

1?? Associate IAM permissions for CA:

eksctl utils associate-iam-oidc-provider --region=us-east-1 --cluster=my-cluster --approve

2?? Deploy Cluster Autoscaler:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - name: cluster-autoscaler
        image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
        command:
        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --nodes=1:10:my-nodegroup

-nodes=1:10:my-nodegroup → Scale between 1 and 10 worker nodes.

?? When to Use Cluster Autoscaler?

? Large workloads that outgrow the existing node capacity.

? Cost optimization by removing underutilized nodes.

?? When to Use HPA, VPA, or CA?

Feature HPA (Pods) VPA (Resources) Cluster Autoscaler (Nodes) Scales Pods CPU & Memory Worker Nodes Handles Increased traffic Unpredictable workloads Insufficient node capacity Best for Web servers, APIs Databases, ML, Batch Jobs Large clusters Requires Metrics Server VPA Recommender Cloud provider integration

?? Conclusion

By combining HPA, VPA, and Cluster Autoscaler, Kubernetes enables a fully automated scaling mechanism, ensuring:

? High availability for workloads.

? Optimized resource utilization (no over/under-provisioning).

? Cost efficiency by removing unused resources.

?? Which auto-scaling method have you used the most? Let’s discuss in the comments! ??

#Kubernetes #DevOps #CloudComputing #AutoScaling #HPA #VPA #ClusterAutoscaler #Scalability #PerformanceOptimization

要查看或添加评论，请登录

Chaitanya Sawant的更多文章

Kubernetes API Server YAML Deep Dive: What Each Line Means

2025年3月7日

Kubernetes API Server YAML Deep Dive: What Each Line Means

The kube-apiserver is the core component of the Kubernetes control plane, acting as the gateway for all cluster…
Understanding Kubernetes API Server: How It Works & Why It’s Essential

2025年2月25日

Understanding Kubernetes API Server: How It Works & Why It’s Essential

If you’ve worked with Kubernetes, you’ve probably come across the term “API Server.” But what exactly is the Kubernetes…

1 条评论
Why Kubelet is Crucial in Kubernetes: Pod Scheduling, Health Checks & More

2025年2月21日

Why Kubelet is Crucial in Kubernetes: Pod Scheduling, Health Checks & More

Kubelet is an essential component of the Kubernetes control loop. It is the primary agent that runs on each node in the…
etcd vs. Traditional Databases: Why Kubernetes Relies on Key-Value Stores?

2025年2月18日

etcd vs. Traditional Databases: Why Kubernetes Relies on Key-Value Stores?

If Kubernetes is a distributed system for managing containerized applications, etcd is its central nervous system. It…
Avoid Noisy Neighbors in Kubernetes: A Deep Dive into Resource Quotas ??

2025年2月14日

Avoid Noisy Neighbors in Kubernetes: A Deep Dive into Resource Quotas ??

In multi-tenant Kubernetes environments, resource management is critical. Without proper controls, a single namespace…
How Kubernetes Survives API Server Failures: The Power of Static Pods1

2025年2月9日

How Kubernetes Survives API Server Failures: The Power of Static Pods1

Kubernetes is built on a highly dynamic and self-healing architecture where pods are typically managed by controllers…
Kubernetes Admission Controllers Explained: What, Why & How?

2025年2月6日

Kubernetes Admission Controllers Explained: What, Why & How?

?? Kubernetes Admission Control: The Gatekeeper of Your Cluster! ???? Kubernetes is a powerful orchestration tool, but…
Unlock the Power of Kubernetes Ingress: Path Routing, Domain Rules, and More

2024年12月20日

Unlock the Power of Kubernetes Ingress: Path Routing, Domain Rules, and More

When managing external traffic in Kubernetes, Ingress is one of the most powerful tools at your disposal. But did you…

2 条评论
Ingress Resource vs Ingress Controller: What’s the Real Difference in Kubernetes?

2024年12月14日

Ingress Resource vs Ingress Controller: What’s the Real Difference in Kubernetes?

If you've worked with Kubernetes, you've likely come across the terms Ingress Resource and Ingress Controller. While…

1 条评论
Why Kubernetes Ingress is Essential for Scalable Application Deployment?

2024年12月11日

Why Kubernetes Ingress is Essential for Scalable Application Deployment?

What is Ingress in Kubernetes? In Kubernetes (K8s), Ingress is a powerful API object that manages external access to…

See all articles

?? 1. Horizontal Pod Autoscaler (HPA)

? How HPA Works:

?? Example: HPA Based on CPU Utilization

?? When to Use HPA?

?? 2. Vertical Pod Autoscaler (VPA)

? How VPA Works:

?? Example: VPA with Auto Mode

?? When to Use VPA?

?? 3. Cluster Autoscaler (CA)

? How Cluster Autoscaler Works:

?? Example: Enabling Cluster Autoscaler in AWS EKS

?? When to Use Cluster Autoscaler?

?? When to Use HPA, VPA, or CA?

?? Conclusion

Chaitanya Sawant的更多文章

Kubernetes API Server YAML Deep Dive: What Each Line Means

Understanding Kubernetes API Server: How It Works & Why It’s Essential

Why Kubelet is Crucial in Kubernetes: Pod Scheduling, Health Checks & More

etcd vs. Traditional Databases: Why Kubernetes Relies on Key-Value Stores?

Avoid Noisy Neighbors in Kubernetes: A Deep Dive into Resource Quotas ??

How Kubernetes Survives API Server Failures: The Power of Static Pods1

Kubernetes Admission Controllers Explained: What, Why & How?

Unlock the Power of Kubernetes Ingress: Path Routing, Domain Rules, and More

Ingress Resource vs Ingress Controller: What’s the Real Difference in Kubernetes?

Why Kubernetes Ingress is Essential for Scalable Application Deployment?

社区洞察