Google Kubernetes Engine Cost Optimization Best Practices

Google Kubernetes Engine Cost Optimization Best Practices

In order to optimize costs for Google Kubernetes Engine (GKE), there are couple of approaches as best practices.


Autoscaling Strategies

First, it is important to understand #GKE #Autoscaling Strategies

  • Horizontal Pod Autoscaling changes the shape of your Kubernetes workload by automatically increasing or decreasing the number of pods in response to the workload's CPU or memory consumption, or in response to custom metrics reported from within Kubernetes or external metrics from sources outside of your cluster.
  • Vertical Pod Autoscaling frees you from having to think about what values to specify for a container's CPU and memory requests. The autoscaler can recommend values for CPU and memory requests and limits, or it can automatically update the values. (You should only use HPA or VPA at once to prevent scaling conflicts)
  • The Cluster Autoscaler?is designed to add or remove nodes based on demand. When demand is high, cluster autoscaler will add nodes to the node pool to accommodate that demand. When demand is low, cluster autoscaler will scale your cluster back down by removing nodes. This allows you to maintain high availability of your cluster while minimizing superfluous costs associated with additional machines.
  • Node Auto Provisioning (NAP) actually adds new node pools that are sized to meet demand. Without node auto provisioning, the cluster autoscaler will only be creating new nodes in the node pools you've specified, meaning the new nodes will be the same machine type as the other nodes in that pool. This is perfect for helping optimize resource usage for batch workloads and other apps that don't need extreme scaling, since creating a node pool that is specifically optimized for your use case might take more time than just adding more nodes to an existing pool.


Proactive Architecture Decision

It is possible to leverage GKE Autopilot to let GKE maximize the efficiency of your cluster's infrastructure. You don't need to monitor the health of your nodes, handle bin-packing, or calculate the capacity that your workloads need.

It might be better fit to use Standard Edition on specific cases depending on limitations: https://cloud.google.com/kubernetes-engine/docs/concepts/choose-cluster-mode#when-use-standard


Proactive Approaches

  • Using Spot VMs for Kubernetes node pools when your pods are fault-tolerant and can terminate gracefully in less than 25 seconds.
  • Choosing cost-efficient machine types (for example: E2, N2D, T2D), which provide 20–40% higher performance-to-price.
  • Using resource quotas in multi-tenant clusters to prevent any tenant from using more than its assigned share of cluster resources.
  • Schedule automatic downscaling of development and test environments after business hours.
  • Manage and respond to alerts programmatically by using the Pub/Sub and Cloud Run functions services.
  • Provision resources in the lowest-cost region that meets the latency requirements of your workload. To control where resources are provisioned, you can use the organization policy constraint gcp.resourceLocations.
  • Committed use discounts (CUDs) are ideal for workloads with predictable resource needs. After migrating your workload to Google Cloud, find the baseline for the resources required, and get deeper discounts for committed usage.
  • Smart use of Pod Anti Affinity and Pod Affinity can help you utilize your regional cluster's resources even better. https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/
  • Apply binpacking efficiently. A binpacking problem is one in which you must fit items of various volumes/shapes into a finite number of regularly shaped “bins” or containers. Essentially, the challenge is to fit the items into the fewest number of bins, “packing” them as efficiently as possible. This is similar to the challenge faced when trying to optimize Kubernetes clusters for the applications they run. You have a number of applications, likely with various resource requirements (i.e. memory and cpu). You must try to fit these applications into the infrastructure resources Kubernetes manages for you (where most of your cluster’s cost likely lies) as efficiently as possible.
  • A multi-tenancy cluster allows for multiple users or teams to share one cluster for their workloads while maintaining isolation and fair resource sharing. This is achieved by creating namespaces. Namespaces allow multiple virtual clusters to exist on the same physical cluster.
  • Design Pod Disruption Budgets (PDBs) wisely. PDBs are a Kubernetes feature that ensures the availability of a minimum number of running pods during voluntary disruptions like node upgrades, scaling events or manual interventions. ?f you set it too high, it could cost too much.
  • Optimize Service Communication. Prefer local pod-to-pod communication and use service meshes like Istio to manage and route traffic efficiently.
  • Reduce Data Transfer. Compress data and use efficient formats like Protocol Buffers to minimize the amount of data transferred. Use local caching and distributed caching solutions like Redis to cut down on repetitive data transfers.
  • Control Ingress/Egress. Use ingress controllers and network policies to manage and restrict external and internal traffic. Restrict unnecessary traffic with Kubernetes Network Policies and segment your network to limit traffic. Design to keep data and services within the same cluster or region to reduce cross-cluster communicatio


Reactive Approaches

  • Leverage Active Assist / Recommendation Hub per organization or project to leverage system recommendations https://console.cloud.google.com/active-assist/dashboard
  • Get email notifications for resource usage and cost by configuring budget alerts.
  • Using GKE usage metering to analyze your clusters' usage profiles by namespaces and labels. Identify the team or application that's spending the most, the environment or component that caused spikes in usage or cost.
  • Configure billing data export to Bigquery to analyze details of usage whenever required: https://cloud.google.com/billing/docs/how-to/export-data-bigquery
  • Leverage readiness and liveness probes to manage your pods so that there is no waste.
  • Define and assign labels consistently.
  • Leverage open source tools where applicable such as Kubecost, KubeResourceReport, Kubernetes Event-Driven Autoscaling (KEDA), Cloud Custodian, OpenCost.
  • Use monitoring tools like Prometheus and Grafana to track and analyze network usage for cost optimization.


Conclusion

It is extremely important to set guardrails and implement best practices wherever you can in order to have sustainable Kubernetes clusters which are cost efficient. Elastic cloud infrastructure is a blessing however it requires precision to manage effectively.

FinOps is a cultural mindset. Create and promote learning programs using conventional or online classes, discussion groups, peer reviews, pair programming and cost-saving games. As shown in Google's DORA research, organizational culture is a key driver for improving performance, reducing rework and burnout and optimizing cost. By giving employees visibility into the cost of their resources, you help them align their priorities and activities with business objectives and constraints. Also check out: https://cloud.google.com/learn/what-is-finops


Also check out the reference architecture:

#googlecloud #kubernetes #k8s #HPA #VPA #costoptimization #finops #binpacking #gke

要查看或添加评论,请登录

Mehmet Cambaz的更多文章

社区洞察