Best Practices For Running Cost-Optimized Kubernetes Applications On Amazon?EKS
Ashish Kasaudhan
AWS Ambassador & Community Builder, 46x Certification, Cloud-Native & Platform Engineering Expert | Kubernetes, Docker, CI/CD, & Automation Specialist | DevOps, FinOps, SecOps, & GitOps Professional | Multi Cloud Expert.
This document discusses Amazon EKS features and options, and the best practices for running cost-optimized applications on EKS to take advantage of the elasticity provided by Amazon Cloud. This document assumes that you are familiar with Kubernetes, Amazon Cloud, EKS, and autoscaling.
Introduction
Kubernetes is a container orchestration platform that includes a variety of configurations. Containers are used to run applications, and they rely on container images to define all of the resources required. Kubernetes handles these containers by forming pods out of one or more of them. Within a cluster of compute nodes, pods can be scheduled and scaled.
Then there are namespaces, which are used to organize Kubernetes resources like pods and deployments. A namespace can be used to simulate the structure of an organization, such as having a single namespace for each team or an environment for developers.
As Kubernetes gets momentum, more businesses and platform-as-a-service (PaaS) and software-as-a-service (SaaS) providers are deploying multi-tenant Kubernetes clusters for their workloads. As a result, a single cluster could be hosting applications from many teams, departments, customers, or environments. Kubernetes’ multi-tenancy allows businesses to manage a few large clusters rather than several smaller ones, resulting in better resource planning, effective supervision, and fragmentation reduction.
Some of these businesses with rapidly increasing Kubernetes clusters begin to see a disproportionate increase in costs over time. This occurs because traditional businesses that use cloud-based technologies like Kubernetes lack cloud experience, which results in unstable applications while autoscaling.
This document provides best practices for running cost-optimized Kubernetes workloads on EKS. The following diagram outlines this approach.
The foundation of building cost-optimized applications is spreading the cost-saving culture across teams. Beyond moving cost discussions to the beginning of the development process, this approach forces you to better understand the environment that your applications are running in?—?in this context, the EKS environment.
In order to achieve low cost and application stability, you must correctly set or tune some features and configurations (such as autoscaling, machine types, and region selection). Another important consideration is your workload type because, depending on the workload type and your application’s requirements, you must apply different configurations in order to further lower your costs. Finally, you must monitor your spending and create guardrails so that you can enforce best practices early in your development cycle.
EKS cost-optimization features and?options
Cost-optimized Kubernetes applications rely heavily on EKS autoscaling. To balance cost, reliability, and scaling performance on EKS, you must understand how autoscaling works and what options you have. This section discusses EKS autoscaling and other useful cost-optimized configurations for both serving and batch workloads.
Fine-tune EKS autoscaling
Autoscaling is the strategy EKS uses to let Amazon Cloud customers pay only for what they need by minimizing infrastructure uptime. In other words, autoscaling saves costs by
EKS handles these autoscaling scenarios by using features like the following:
Horizontal Pod Autoscaler
Horizontal Pod Autoscaler (HPA) scales the number of Pods in a Deployment. Statefulset, ReplicaSet based on CPU/Memory utilization or any custom metrics exposed by your application. The HPA works on a control loop. Each separate HPA exists for each Deployment, Statefulset, Replicaset. The HPA object constantly checks the Deployments, Statefulset, Replicaset’s metrics against the Memory/CPU threshold that you specify and keeps on increasing/decreasing the replicas count. By using HPA we will only be paying for the extra resources only when we need them.
The following are best practices for enabling HPA in your application:
For more information, see Configuring a Horizontal Pod Autoscaler.
Vertical Pod Autoscaler
Unlike Horizontal Pod Autoscaler ( HPA ), Vertical Pod Autoscaler ( VPA ) automatically adjusts the CPU and Memory attributes for your Pods. The Vertical Pod Autoscaler ( VPA ) will automatically recreate your pod with the suitable CPU and Memory attributes. This will free up the CPU and Memory for the other pods and help you better utilize your Kubernetes cluster. The Kubernetes worker nodes are used efficiently because Pods use exactly what they need. The Vertical Pod Autoscaler ( VPA ) can suggest the Memory/CPU requests and Limits and it can also automatically update it if enabled by the user. This will reduce the time taken by the engineers running the Performance/Benchmark testing to determine the correct values for CPU and memory requests
VPA can work in three different modes:
If you plan to use VPA, the best practice is to start with the Off mode for pulling VPA recommendations. Make sure it’s running for 24 hours, ideally one week or more, before pulling recommendations. Then, only when you feel confident, consider switching to either Initial or Auto mode.
Follow these best practices for enabling VPA, either in Initial or Auto mode, in your application:
Cluster Autoscaler
Cluster Autoscaler (CA) automatically resizes the underlying computer infrastructure. CA provides nodes for Pods that don’t have a place to run in the cluster and removes under-utilized nodes. CA is optimized for the cost of infrastructure. In other words, if there are two or more node types in the cluster, CA chooses the least expensive one that fits the given demand.
Certain Pods cannot be restarted by any auto scaler when they cause some temporary disruption, so the node they run on can’t be deleted. For example, system Pods (such as metrics-server and kube-dns), and Pods using local storage won't be restarted. However, you can change this behavior by defining PDBs for these system Pods and by setting "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" annotation for Pods using local storage that is safe for the autoscaler to restart. Moreover, consider running long-lived Pods that can't be restarted on a separate node pool, so they don't block scale-down of other nodes. Finally, learn how to analyze CA events in the logs to understand why a particular scaling activity didn't happen as expected.
If your workloads are resilient to nodes restarting inadvertently and to capacity losses, you can save more money by creating a cluster or node pool with preemptible VMs. For CA to work as expected, Pod resource requests need to be large enough for the Pod to function normally. If resource requests are too small, nodes might not have enough resources and your Pods might crash or have troubles during runtime.
The following is a summary of the best practices for enabling Cluster Autoscaler in your cluster:
For more information, see Autoscaling a cluster.
Karpenter
Karpenter is an open-source, flexible, high-performance Kubernetes cluster autoscaler built with AWS. It helps improve your application availability and cluster efficiency by rapidly launching right-sized compute resources in response to changing application load. Karpenter also provides just-in-time compute resources to meet your application’s needs and will soon automatically optimize a cluster’s compute resource footprint to reduce costs and improve performance.
Managed-node-groups
Amazon EKS managed node groups automate the provisioning and lifecycle management of nodes (Amazon EC2 instances) for Amazon EKS Kubernetes clusters.
With Amazon EKS-managed node groups, you don’t need to separately provision or register the Amazon EC2 instances that provide compute capacity to run your Kubernetes applications. You can create, automatically update, or terminate nodes for your cluster with a single operation. Node updates and terminations automatically drain nodes to ensure that your applications stay available.
Every managed node is provisioned as part of an Amazon EC2 Auto Scaling group that’s managed for you by Amazon EKS. Every resource including the instances and Auto Scaling groups runs within your AWS account. Each node group runs across multiple Availability Zones that you define.
You can add a managed node group to new or existing clusters using the Amazon EKS console, eksctl, AWS CLI; AWS API, or infrastructure as code tools including AWS CloudFormation. Nodes launched as part of a managed node group are automatically tagged for auto-discovery by the Kubernetes cluster autoscaler. You can use the node group to apply Kubernetes labels to nodes and update them at any time.
There are no additional costs to use Amazon EKS-managed node groups, you only pay for the AWS resources you provision. These include Amazon EC2 instances, Amazon EBS volumes, Amazon EKS cluster hours, and any other AWS infrastructure. There are no minimum fees and no upfront commitments.
To get started with a new Amazon EKS cluster and managed node group, see Getting started with Amazon EKS?—?AWS Management Console and AWS CLI.
To add a managed node group to an existing cluster, see Creating a managed node group.
Choose the right machine?type
Beyond autoscaling, other configurations can help you run cost-optimized Kubernetes applications on EKS. This section discusses choosing the right machine type.
Spot Instances
Amazon EC2 Spot Instances are spare EC2 capacity that offers discounts of 70–90% over On-Demand prices. The Spot price is determined by term trends in supply and demand and the amount of On-Demand capacity on a particular instance size, family, Availability Zone, and AWS Region.
领英推荐
If the available On-Demand capacity of a particular instance type is depleted, the Spot Instance is sent an interruption notice two minutes ahead to gracefully wrap up things. I recommend a diversified fleet of instances, with multiple instance types created by Spot Fleets or EC2 Fleets.
Whatever the workload type, you must pay attention to the following constraints:
Spot instance can be a great fit if you are running a short-lived Kubernetes cluster for POC or a non-production environment. (there could be many use cases).
On-Demand Instance
With On-Demand Instances, you pay for computing capacity by the second with no long-term commitments. You have full control over its lifecycle?—?you decide when to launch, stop, hibernate, start, reboot, or terminate it.
On-demand instances are more suitable for running a stable Kubernetes cluster with a mixed pool of instances without purchasing any RI or saving plans, based on your application requirement.
Reserved Instance
An AWS reserved instance is officially described as a “billing discount” applied to the use of an on-demand instance in your account. In other words, a reserved instance is not actually a physical instance, rather it is the discounted billing you get when you commit to using a specific on-demand instance for a long-term period of one or three years.
Reserved instances are ideal for steady and predictable usage. They can help you save significantly on your Amazon EC2 costs compared to on-demand instance pricing because, in exchange for your commitment to pay for all the hours in a one-year or three-year term, the hourly rate is lowered significantly.
Reserved instances are more suitable for running long-term Kubernetes clusters with pre-defined sizes of ec2 instances.
Select the appropriate region
When cost is a constraint, where you run your EKS clusters matters. Due to many factors, cost varies per computing region. So make sure you are running your workload in the least expensive option but where latency doesn’t affect your customer. If your workload requires copying data from one region to another?—?for example, to run a batch job?—?you must also consider the cost of moving this data.
Use RI’s and Savings?Plans
There are three important ways to optimize compute costs, and AWS has the tools to help you with all of them. It starts with choosing the right EC2 purchase model for your workloads, selecting the right instance to fine-tune price performance, and mapping usage to actual demand.
Amazon EC2 Reserved Instances (RI) provide a significant discount (up to 72%) compared to On-Demand pricing and provide a capacity reservation when used in a specific Availability Zone.
Savings Plans is a flexible pricing model that can help you reduce your bill by up to 72% compared to On-Demand prices, in exchange for a one- or three-year hourly spend commitment. AWS offers three types of Savings Plans: Compute Savings Plans, EC2 Instance Savings Plans.
Review small development clusters
For small development clusters, such as clusters with three or fewer nodes or clusters that use machine types with limited resources, you can reduce resource usage by disabling or fine-tuning a few cluster add-ons. This practice is especially useful if you have a cluster-per-developer strategy and your developers don’t need things like autoscaling, logging, and monitoring. However, because of the cost per cluster and simplified management, we recommend that you start using a multi-tenancy cluster strategy.
Understand your application capacity
When you plan for application capacity, know how many concurrent requests your application can handle, how much CPU and memory it requires, and how it responds under heavy load. Most teams don’t know these capacities, so we recommend that you test how your application behaves under pressure. Try isolating a single application Pod replica with autoscaling off, and then execute the tests simulating an actual usage load. This helps you understand your per-Pod capacity. We then recommend configuring your Cluster Autoscaler, resource requests and limits, and either HPA or VPA. Then stress your application again, but with more strength to simulate sudden bursts or spikes.
Ideally, to eliminate latency concerns, these tests must run from the same region or zone that the application is running on Google Cloud. You can use the tool of your choice for these tests, whether it’s a homemade script or a more advanced performance tool, like Apache Benchmark, JMetter, or Locust.
Make sure your application can grow vertically and horizontally
Ensure that your application can grow and shrink. This means you can choose to handle traffic increases either by adding more CPU and memory or adding more Pod replicas. This gives you the flexibility to experiment with what fits your application better, whether that’s a different autoscaler setup or a different node size. Unfortunately, some applications are single-threaded or limited by a fixed number of workers or subprocesses that make this experiment impossible without a complete refactoring of their architecture.
Set appropriate resource requests and?limits
By understanding your application capacity, you can determine what to configure in your container resources. Resources in Kubernetes are mainly defined as CPU and memory (RAM). You configure CPU or memory as the amount required to run your application by using the request spec.containers[].resources.requests.<cpu|memory>, and you configure the cap by using the request spec.containers[].resources.limits.<cpu|memory>.
When you’ve correctly set resource requests, the Kubernetes scheduler can use them to decide which node to place your Pod on. This guarantees that Pods are being placed in nodes that can make them function normally, so you experience better stability and reduced resource waste. Moreover, defining resource limits helps ensure that these applications never use all available underlying infrastructure provided by computing nodes.
A good practice for setting your container resources is to use the same amount of memory for requests and limits, and a larger or unbounded CPU limit. Take the following deployment as an example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: wordpress
spec:
replicas: 1
selector:
matchLabels:
app: wp
template:
metadata:
labels:
app: wp
spec:
containers:
- name: wp
image: wordpress
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "128Mi"
The reasoning for the preceding pattern is founded on how Kubernetes out-of-resource handling works. Briefly, when computer resources are exhausted, nodes become unstable. To avoid this situation, kubelet monitors and prevents total starvation of these resources by ranking the resource-hungry Pods. When the CPU is contended, these Pods can be throttled down to its requests. However, because memory is an incompressible resource when memory is exhausted, the Pod needs to be taken down. To avoid having Pods taken down—and consequently, destabilizing your environment—you must set the requested memory to the memory limit.
You can also use VPA in recommendation mode to help you determine CPU and memory usage for a given application. Because VPA provides such recommendations based on your application usage, we recommend that you enable it in a production-like environment to face real traffic. VPA status then generates a report with the suggested resource requests and limits, which you can statically specify in your deployment manifest.
Make sure your container is as lean as?possible
When you run applications in containers, it’s important to follow some practices for building those containers. When running those containers on Kubernetes, some of these practices are even more important because your application can start and stop at any moment. This section focuses mainly on the following two practices:
Consider these two practices when designing your system, especially if you are expecting bursts or spikes. Having a small image and a fast startup helps you reduce scale-up latency. Consequently, you can better handle traffic increases without worrying too much about instability.
Set meaningful readiness and liveness probes for your application
Setting meaningful probes ensures your application receives traffic only when it is up and running and ready to accept traffic. EKS uses readiness probes to determine when to add Pods to or remove Pods from load balancers. EKS uses liveness probes to determine when to restart your Pods.
Make sure your applications are shutting down according to Kubernetes expectations
Autoscalers help you respond to spikes by spinning up new Pods and nodes, and by deleting them when the spikes finish. That means that to avoid errors while serving your Pods must be prepared for either a fast startup or a graceful shutdown.
Because Kubernetes asynchronously updates endpoints and load balancers, it’s important to follow these best practices in order to ensure non-disruptive shutdowns:
Monitor your environment and enforce cost-optimized configurations and practices
In many medium and large enterprises, a centralized platform and infrastructure team are often responsible for creating, maintaining, and monitoring Kubernetes clusters for the entire company. This represents a strong need for resource usage accountability and for making sure all teams are following the company’s policies.
Amazon EKS supports Kubecost, which you can use to monitor your costs broken down by Kubernetes resources including pods, nodes, namespaces, and labels. As a Kubernetes platform administrator and finance leader, you can use Kubecost to visualize a breakdown of Amazon EKS charges, allocate costs, and charge back organizational units such as application teams. You can provide your internal teams and business units with transparent and accurate cost data based on their actual AWS bill. Moreover, you can also get customized recommendations for cost optimization based on their infrastructure environment and usage patterns within their clusters. For more information about Kubecost, see the Kubecost documentation. For More details refer to the link:
Cost monitoring Amazon EKS supports Kubecost, which you can use to monitor your costs broken down by Kubernetes resources including…docs.aws.amazon.com
Important Implementation links:
EKS Cluster with spot instance
Run your Kubernetes Workloads on Amazon EC2 Spot Instances with Amazon EKS | Amazon Web Services Contributed by Madhuri Peri, Sr. EC2 Spot Specialist SA, and Shawn OConnor, AWS Enterprise Solutions Architect Update?…aws.amazon.com
EKS Cluster with Karpenter
Introducing Karpenter - An Open-Source High-Performance Kubernetes Cluster Autoscaler | Amazon Web… Today we are announcing that Karpenter is ready for production. Karpenter is an open-source, flexible, high-performance…aws.amazon.com
EKS Cluster with Kubecost
Cost monitoring Amazon EKS supports Kubecost, which you can use to monitor your costs broken down by Kubernetes resources including…docs.aws.amazon.com
EKS Cluster Autoscaling