GKE Made Easy: The Essential Guide for Businesses (part 3)

GKE Made Easy: The Essential Guide for Businesses (part 3)

For small to medium-sized enterprises with fluctuating workloads, implementing a scalable and cost-efficient solution on Google Cloud could involve using Google Kubernetes Engine (GKE). GKE allows for the automatic scaling of applications in response to changes in demand, ensuring that the client only pays for the resources they actually use.

In the article “GKE Made Easy: The Essential Guide for Businesses (part 1) ”?we presented the fundamental concepts of GKE, such as containers, Pods, Nodes, Node Pools, Clusters, ReplicaSets, Deployments, deployment config files, and resources like Services, ConfigMaps, Secrets, Volumes, Persistent Volumes, and Namespaces.

In the article?“GKE Made Easy: The Essential Guide for Businesses (part 2) ”??we delved into the autoscaling capabilities of GKE, both horizontal and vertical.

In this article, we will explore the basics of how to optimize costs for GKE. This is the last article of this series.?

Optimizing costs involves maximizing the return on what you are spending. Remember, high cost doesn't automatically mean that something is wrong, but you don't want to pay for things that you're not using.

GKE costs are mainly based on three components: a) the managed architecture; b) the resources used, such as CPUs, memory and storage; c) networking resources and location.?

If there is one single thing you should remember about this article, it is for sure that the more you understand about the full details of your application, the more opportunities you have to optimize the costs of GKE. Keeping developers, operators and everyone else in the dark about how their applications actually run might cost more harm than good. Instead, consider spending the time to train your technical teams on kubernetes best practices, like online classes, code labs, even articles like this one. When teams understand and care about the cost it takes to run their apps, the more those teams will be able to optimize.??

Your organization might have a platform and infrastructure team that manages the clusters, which can be completely separate from the dev team that actually writes the applications. When every team understands the cost implications for the decisions they make, like how many resources an app actually needs, they can all work together to get things optimized.

Multi-tenant clusters

It is recommended to isolate production and development environments. It can be effectively done by creating different GKE clusters for each environment. Also, depending on production environments and specific computing and memory needs, different clusters for production can be created.

However, for the development environment it is a best practice to have one single Multi-Tenant cluster, which can save the overhead. By using namespaces and policies to keep resources capped and isolated, dev teams can have the room to experiment that won’t lead to any surprise bills.

Additionally, there are some add-ons that come with the GKE cluster that are important to the production environment, but might not be needed for the development environment. For the development of GKE clusters, consider disabling or limiting add-ons if you are not using them, such as Cloud Logging and Monitoring, Horizontal Pod Autoscaling, the Kubernetes Dashboard, and Kube DNS.

Following the best practice of splitting teams or apps by namespace also gives you the ability to easily see the resource usage and cost for each one. Labels can also be used for filtering.

Create Role-based Access Control for Namespace access

Provisioning access to namespaced resources in a cluster is accomplished by granting a combination of IAM roles and Kubernetes’ built-in role-based access control (RBAC). An IAM role initially gives an account access to the project, while RBAC permissions grant granular access to a cluster’s namespaced resources such as pods, deployments, and services..

When managing access control for Kubernetes, Identity and Access Management (IAM) is used to manage access and permissions on a higher organization and project levels.

There are several roles that can be assigned to users and service accounts in IAM that govern their level of access with GKE. RBAC’s granular permissions build on the access already provided by IAM and cannot restrict access granted by it. As a result, for multi-tenant namespaced clusters, the assigned IAM role should grant minimal access.

The IAM role “Kubernetes Engine Cluster Viewer” gives users just enough permissions to access the cluster and namespaced resources.

Within a cluster, access to any resource type (pods, services, deployments, etc) is defined by either a role or a cluster role. Only roles are allowed to be scoped to a namespace. While a role will indicate the resources and the action allowed for each resource, a role binding will indicate to what user accounts or groups to assign that access to.

Kubernetes Resource Quotas

With the ability to see exactly how resources are being used, you may also want to use Kubernetes Resource Quotas. Resource Quotas let you cap the amount of resources that a namespace can use, which is another great reason to use multi-tenant clusters to split up by namespaces. This can help you keep any tenant from consuming too many resources, which might then trigger auto-scaling.

A resource quota can specify a limit on object counts (pods, services, stateful sets, etc.), total storage resources (persistent volume claims, ephemeral storage, storage classes), and total compute resources (CPU and memory).

When setting quotas for CPU and memory, you can indicate a quota for the sum of requests (a value that a container is guaranteed to get) or the sum of limits (a value that a container will never be allowed to pass). Note that When a resource quota for CPU or memory exists in a namespace, every container that is created in that namespace thereafter must have its own CPU and memory limit defined on creation or by having a default value assigned in the namespace as a LimitRange.

CI/CD for cost-optimization

Another way to avoid potential issues is to make sure that any configuration changes are reviewed before they get deployed. So if a team unexpectedly updates a config with 1,000 replicas, you can deal with it before it spikes your costs. The Anthos Policy Controller is one way to help automate this by checking, auditing, and enforcing those policies that you’ve created. These policies can help you control security, regulations, and custom business logic that might affect your costs or your application stability.

Virtual Machines

The virtual machines on which run the nodes are a fundamental part of the GKE costs, and choosing the right hardware to run and fit your apps is critical for balancing costs and performances.

The N-series machines are the default for most general workloads on Google Cloud.

Knowing your app needs in terms of allocated memory and CPU, could allow you to stack them on fewer optimized machines, instead of spreading your developments across a bunch of general machines. Binpacking is the art of finding the most efficient way to run apps on your machines and to minimize wasted space.

Kubernetes will try to find a node to schedule your pods. And if it can’t find one, it’ll add more nodes according to your node pool configuration.

To help optimize usage, think about the size of our pods. They’re not all going to be the same size, but when we have a general sense of each application, we can optimize nodes around that.

In some cases, one larger node can be less expensive than several general nodes to run all the replicas of your pods.

The good news is that GKE offers node auto-provisioning, which automatically figures out optimal machine sizes and then dynamically adds and removes node pools for you.

Preemptible Machines

When creating nodes and Nodes Pools in GKE it is possible to specify if they are using preemptible machines. Preemptible machines are significantly cheaper, costing up to 80% less. However, there are limitations: each node can last a maximum of 24 hours and they can be terminated with very short notice.

So, only apps that are fault tolerant can run on preemptible nodes. Definitely keep in mind how your apps handle things like state, potential unavailability, and less than graceful terminations.

The cluster autoscaler will actually prefer preemptible nodes

Sign up for committed-use discounts

If the workload is predictable, signing up for committed-use discounts is recommended. These are basically contracts where you sign up for a one or three year commitment on a certain number of resources, like CPU or memory. You’re committed to paying for them regardless of if you actually end up using them, but you get a huge discount in return.

For GKE, think about how the resource usage is spiky and if there is a minimum amount of resources that are used no matter what. In this case, you can sign up for committed-use discounts to make the baseline cheaper.

Choose the right region

GKE running in different regions has different prices. If the application isn’t latency sensitive, you can choose a cheaper region. But keep in mind that moving data from different regions has a cost associated with it.

Inter-pod affinity and anti-affinity are GKE configurations that can allow to control where the pods will be scheduled.

Autoscaling

GKE has powerful autoscaling capabilities. GKE can get additional resources when demand increases and shut down or remove resources when demand decreases.

By efficiently setting up autoscaling specific to your workload, you can minimize both waste and costs.

As described in the previous article of this series, “GKE Made Easy: The Essential Guide for Businesses (part 2)”, autoscaling can be on workloads and on infrastructure, both vertical or horizontal. If there’s a lot of demand coming in, the autoscaler can try to increase the number of pods or nodes to handle it. Vertical scaling is done by making pods or nodes bigger. When demand increases, a larger pod or node may be able to reduce the burden by handling more.

To manage traffic spikes, use horizontal pod autoscaling (HPA), which is faster than vertical pod autoscaling (VPA). Also, be sure to size the buffer correctly, in order to balance the support for spikes and the costs. Concerning Vertical autoscaling, it is possible to set it in a recommendation mode, in order allow you to make decisions about vertical node scaling.

Add Pod Disruption Budgets

When setting autoscaling, think about an appropriate pod disruption budget. You can set the number of pods or the percentage of pods that can be taken down when doing voluntary disruptions, like upgrades or autoscaling.Think about the minimum amount of pods that you need to run without disrupting your applications’ users. And think about that for each application independently. Pod disruption budgets can help keep your apps running as you configure autoscaling without overprovisioning.

If you have any pods that are supposed to run for a long time without being restarted, you might want to move those over to a separate node pool, so they don’t block the cluster autoscaler.

Cluster Autoscaler

The Cluster Autoscaler is designed to add or remove nodes based on demand. When demand is high, cluster autoscaler will add nodes to the node pool to accommodate that demand. When demand is low, cluster autoscaler will scale your cluster back down by removing nodes. This allows you to maintain high availability of your cluster while minimizing superfluous costs associated with additional machines.

It’s important to note that, while Cluster Autoscaler removed an unnecessary node, Vertical Pod Autoscaling and Horizontal Pod Autoscaling helped reduce enough CPU demand so that the node was no longer needed. Combining these tools is a great way to optimize your overall costs and resource usage.

Node Autoprovision

Cluster autoscaler is conceived to reduce costs. It means that in case that the traffic increases, autoscaler will start by trying to create additional nodes of the cheapest type and then move to more expensive node types if it can’t create the cheaper ones.

Node Autoprovision is the GKE feature dedicated to vertical node autoscaling. Node autoprovisioning actually adds new node pools that are sized to meet demand. Without node autoprovisioning, the cluster autoscaler will only be able to create new nodes in the node pools that you’ve specified.

Conclusion

Understanding your application capacity is an important step to take when choosing resource requests and limits for your application’s pods and for deciding the best auto-scaling strategy.

By load testing your application running on a single pod with no autoscaling configured, you will learn how many concurrent requests your application can handle, how much CPU and memory it requires, and how it might respond to heavy load.

Getting the pod CPU and the pod memory utilization will be useful when configuring your Cluster Autoscaler, resource requests and limits, and choosing how or whether to implement a horizontal or vertical pod autoscaler.

Along with a baseline, you should also take into account how your application may perform after sudden bursts or spikes.

Using a tool of your choice, increase the traffic the application manages and observe changes in CPU and memory utilization.

This will allow to properly set the memory request pod parameter and decide if an horizontal pod autoscaling needs to be configured, including an appropriate disruption budget.



Written by Mauro Di Pasquale

Google Professional Cloud Architect and Professional Data Engineer certified. I love learning new things and sharing with the community. Founder of Dipacloud.

Written by a human. Misspelling and grammar errors corrected with AI.

要查看或添加评论,请登录

Mauro Di Pasquale的更多文章

  • Cloud Cost Management Made Easy

    Cloud Cost Management Made Easy

    How do you manage costs in cloud-based data architectures? The Cloud’s Promise… and Its Costs As cloud migration…

  • GenAI, Gemini and Multimodal RAG: What are them and How Can Boost Your Business?

    GenAI, Gemini and Multimodal RAG: What are them and How Can Boost Your Business?

    Are you ready to turn your data into a competitive edge? With the advent of Generative Artificial Intelligence (GenAI)…

  • ETL Optimization on GCP

    ETL Optimization on GCP

    How Do You Handle Performance Bottlenecks in ETL Pipelines Using Google Cloud Platform? Data bottlenecks in ETL…

  • From Monolith to Cloud

    From Monolith to Cloud

    Modernizing a Monolith: How to Turn Your Legacy Website into a Cloud-Native App Recently, I worked on modernizing a…

    2 条评论
  • Cloud vs On-Premise Storage?

    Cloud vs On-Premise Storage?

    Future-Proofing Your Business: Choosing the Right Storage Solution Choosing the right storage solution for your…

  • Is Your Business Safe in the Cloud? A Closer Look at Google Cloud Security

    Is Your Business Safe in the Cloud? A Closer Look at Google Cloud Security

    Is your business safe in the cloud? With cyber threats on the rise, securing your business’s data is more critical than…

  • Move Databases to Cloud SQL

    Move Databases to Cloud SQL

    How do you choose the best strategy to migrate your relational database to Google Cloud SQL? As is often the case in…

  • Cloud Migration Made Easy

    Cloud Migration Made Easy

    This is the start of my new LinkedIn Newsletter: “Cloud Migration Made Easy”. Cloud migration doesn’t have to be…

  • Proving Cloud Migration ROI to Skeptical CFOs

    Proving Cloud Migration ROI to Skeptical CFOs

    “Migrating to Google Cloud is really profitable?” This is the question that often arises when proposing a migration to…

  • Measuring Success in Cloud Projects: A Holistic Approach

    Measuring Success in Cloud Projects: A Holistic Approach

    When it comes to determining the success or failure of a cloud project, I focus on three key areas: Alignment with…

    3 条评论

社区洞察

其他会员也浏览了