Google Kubernetes Engine Cost Optimization Best Practices

Mehmet Cambaz

Technical Executive, 11x GCP Certified, ex-IBM, ex-Ericsson

发布日期: 2024年9月17日

+ 关注

In order to optimize costs for Google Kubernetes Engine (GKE), there are couple of approaches as best practices.

Autoscaling Strategies

First, it is important to understand #GKE #Autoscaling Strategies

Horizontal Pod Autoscaling changes the shape of your Kubernetes workload by automatically increasing or decreasing the number of pods in response to the workload's CPU or memory consumption, or in response to custom metrics reported from within Kubernetes or external metrics from sources outside of your cluster.
Vertical Pod Autoscaling frees you from having to think about what values to specify for a container's CPU and memory requests. The autoscaler can recommend values for CPU and memory requests and limits, or it can automatically update the values. (You should only use HPA or VPA at once to prevent scaling conflicts)
The Cluster Autoscaler?is designed to add or remove nodes based on demand. When demand is high, cluster autoscaler will add nodes to the node pool to accommodate that demand. When demand is low, cluster autoscaler will scale your cluster back down by removing nodes. This allows you to maintain high availability of your cluster while minimizing superfluous costs associated with additional machines.
Node Auto Provisioning (NAP) actually adds new node pools that are sized to meet demand. Without node auto provisioning, the cluster autoscaler will only be creating new nodes in the node pools you've specified, meaning the new nodes will be the same machine type as the other nodes in that pool. This is perfect for helping optimize resource usage for batch workloads and other apps that don't need extreme scaling, since creating a node pool that is specifically optimized for your use case might take more time than just adding more nodes to an existing pool.

Proactive Architecture Decision

It is possible to leverage GKE Autopilot to let GKE maximize the efficiency of your cluster's infrastructure. You don't need to monitor the health of your nodes, handle bin-packing, or calculate the capacity that your workloads need.

You can check out pricing: https://cloud.google.com/kubernetes-engine/pricing#autopilot_mode
Limitations and comparison to standard mode: https://cloud.google.com/kubernetes-engine/docs/resources/autopilot-standard-feature-comparison#feature-comparison

It might be better fit to use Standard Edition on specific cases depending on limitations: https://cloud.google.com/kubernetes-engine/docs/concepts/choose-cluster-mode#when-use-standard

Proactive Approaches

Using Spot VMs for Kubernetes node pools when your pods are fault-tolerant and can terminate gracefully in less than 25 seconds.
Choosing cost-efficient machine types (for example: E2, N2D, T2D), which provide 20–40% higher performance-to-price.
Using resource quotas in multi-tenant clusters to prevent any tenant from using more than its assigned share of cluster resources.
Schedule automatic downscaling of development and test environments after business hours.
Manage and respond to alerts programmatically by using the Pub/Sub and Cloud Run functions services.
Provision resources in the lowest-cost region that meets the latency requirements of your workload. To control where resources are provisioned, you can use the organization policy constraint gcp.resourceLocations.
Committed use discounts (CUDs) are ideal for workloads with predictable resource needs. After migrating your workload to Google Cloud, find the baseline for the resources required, and get deeper discounts for committed usage.
Smart use of Pod Anti Affinity and Pod Affinity can help you utilize your regional cluster's resources even better. https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/
Apply binpacking efficiently. A binpacking problem is one in which you must fit items of various volumes/shapes into a finite number of regularly shaped “bins” or containers. Essentially, the challenge is to fit the items into the fewest number of bins, “packing” them as efficiently as possible. This is similar to the challenge faced when trying to optimize Kubernetes clusters for the applications they run. You have a number of applications, likely with various resource requirements (i.e. memory and cpu). You must try to fit these applications into the infrastructure resources Kubernetes manages for you (where most of your cluster’s cost likely lies) as efficiently as possible.
A multi-tenancy cluster allows for multiple users or teams to share one cluster for their workloads while maintaining isolation and fair resource sharing. This is achieved by creating namespaces. Namespaces allow multiple virtual clusters to exist on the same physical cluster.
Design Pod Disruption Budgets (PDBs) wisely. PDBs are a Kubernetes feature that ensures the availability of a minimum number of running pods during voluntary disruptions like node upgrades, scaling events or manual interventions. ?f you set it too high, it could cost too much.
Optimize Service Communication. Prefer local pod-to-pod communication and use service meshes like Istio to manage and route traffic efficiently.
Reduce Data Transfer. Compress data and use efficient formats like Protocol Buffers to minimize the amount of data transferred. Use local caching and distributed caching solutions like Redis to cut down on repetitive data transfers.
Control Ingress/Egress. Use ingress controllers and network policies to manage and restrict external and internal traffic. Restrict unnecessary traffic with Kubernetes Network Policies and segment your network to limit traffic. Design to keep data and services within the same cluster or region to reduce cross-cluster communicatio

Reactive Approaches

Leverage Active Assist / Recommendation Hub per organization or project to leverage system recommendations https://console.cloud.google.com/active-assist/dashboard
Get email notifications for resource usage and cost by configuring budget alerts.
Using GKE usage metering to analyze your clusters' usage profiles by namespaces and labels. Identify the team or application that's spending the most, the environment or component that caused spikes in usage or cost.
Configure billing data export to Bigquery to analyze details of usage whenever required: https://cloud.google.com/billing/docs/how-to/export-data-bigquery
Leverage readiness and liveness probes to manage your pods so that there is no waste.
Define and assign labels consistently.
Leverage open source tools where applicable such as Kubecost, KubeResourceReport, Kubernetes Event-Driven Autoscaling (KEDA), Cloud Custodian, OpenCost.
Use monitoring tools like Prometheus and Grafana to track and analyze network usage for cost optimization.

Conclusion

It is extremely important to set guardrails and implement best practices wherever you can in order to have sustainable Kubernetes clusters which are cost efficient. Elastic cloud infrastructure is a blessing however it requires precision to manage effectively.

FinOps is a cultural mindset. Create and promote learning programs using conventional or online classes, discussion groups, peer reviews, pair programming and cost-saving games. As shown in Google's DORA research, organizational culture is a key driver for improving performance, reducing rework and burnout and optimizing cost. By giving employees visibility into the cost of their resources, you help them align their priorities and activities with business objectives and constraints. Also check out: https://cloud.google.com/learn/what-is-finops

Source and more details: https://partner.cloudskillsboost.google/course_templates/655

Best practices for running cost-optimized Kubernetes applications on GKE: https://cloud.google.com/architecture/framework/cost-optimization/compute#gke

Also check out the reference architecture:

Globally distributed applications exposed through GKE Gateway and Cloud Service Mesh: https://cloud.google.com/architecture/build-apps-using-gateway-and-cloud-service
Manage and scale networking for Windows applications that run on managed Kubernetes: https://cloud.google.com/architecture/manage-and-scale-windows-networking
Distributed load testing using Google Kubernetes Engine: https://cloud.google.com/architecture/distributed-load-testing-using-gke
Agones, Open-source, multiplayer, dedicated game-server hosting built on Kubernetes: https://cloud.google.com/blog/products/containers-kubernetes/introducing-agones-open-source-multiplayer-dedicated-game-server-hosting-built-on-kubernetes

#googlecloud #kubernetes #k8s #HPA #VPA #costoptimization #finops #binpacking #gke

要查看或添加评论，请登录

Mehmet Cambaz的更多文章

7 Tailored Interview Questions for Hiring Engineering Managers

2025年1月21日

7 Tailored Interview Questions for Hiring Engineering Managers

As a leader, hiring the right talent is crucial for achieving team targets. Assessing candidates within limited…
Cloud Run vs App Engine, when to use which?

2024年7月16日

Cloud Run vs App Engine, when to use which?

Cloud Run and App Engine might be considered similar depending on the use case and if you are new to Google Cloud. I…
Experience Grafana in 5 minutes

2021年3月7日

Experience Grafana in 5 minutes

This article with screenshots is here Grafana is an open source observability software which could be downloaded from…
KNative on Kubernetes: Serverless without Vendor-Lock-in

2021年2月14日

KNative on Kubernetes: Serverless without Vendor-Lock-in

Previously I had shared an article about installing minikube and deploying apps. You can also leverage Microk8s package…
Visualize Jira Service Desk Data

2020年12月27日

Visualize Jira Service Desk Data

When you are handling customer interactions throughout Jira Service Desk, you most probably will need to see overall…
Kubernetes from install to deploy (minikube)

2020年12月27日

Kubernetes from install to deploy (minikube)

I had played a little bit with minikube on Mac to understand how kubernetes work from installation to deployment at…

See all articles

Autoscaling Strategies

Proactive Architecture Decision

Proactive Approaches

Reactive Approaches

Conclusion

Mehmet Cambaz的更多文章

7 Tailored Interview Questions for Hiring Engineering Managers

Cloud Run vs App Engine, when to use which?

Experience Grafana in 5 minutes

KNative on Kubernetes: Serverless without Vendor-Lock-in

Visualize Jira Service Desk Data

Kubernetes from install to deploy (minikube)

社区洞察