Saving Money in EKS is a Good Start
Ned Bellavance
Technical Educator and Content Creator | Microsoft MVP 8x | HashiCorp Ambassador 5x
One of the touted benefits of cloud is its elasticity, the ability to scale out a workload as demand rises, and more importantly scale in when the demand decreases.
When that scale in event doesn't occur, you end up wasting money on resources you don't need. Based on studies from Platform9 and Datadog, Kubernetes is one of the most wasteful services in the cloud today. Platform9 quotes average EKS utilization at 30% and Datadog say that 45% of containers are using 30% or less of their requested memory. As a former VMware admin, I can definitely concur that people always request way more memory than they actually need for their workload.
And that is a totally understandable desire, to ask for the maximum amount of resources to guarantee your application will function properly. Maybe you think you only need 4GB and 2vCPUS, but what if you need more? Getting more resources is going to be a helpdesk ticket and a change window. Better to request worst-case-scenario numbers from the get go and be able to sleep easy at night.
There are two different problems to be solved here: bin packing and overallocated memory. In my VMware days, you could try oversubscribing your hosts and letting DRS moving VMs around. But that was on-premises, where you couldn't elastically scale your hardware. If you suddenly need more capacity, you were SOL. Cloud removes that restriction.
Platform9 is trying to solve the both the bin packing and memory allocation problems with their new Elastic Machine Pooling (EMP) offering for EKS.
What are these elastic machine pools? Platform9 provisions bare metal EC2 instances in AWS and creates VM nodes for your EKS cluster on the bare metal instances. They are bin packing your nodes and using memory oversubscription to reduce the number of EC2 instances you run. If utilization actually starts to approach provisioned resources, they will live migrate nodes to another bare metal instance to free up space.
It's a novel solution to a persistent problem. But it's also creating additional layers of complexity without solving the underlying issue of resource over-allocation. A problem driven by fear or ignorance by those provisioning workloads.
A developer's first priority is to deliver an application that works for end users and meets SLAs (at least it should be.) Underestimating resource requirements runs the risk of not having access to additional capacity if circumstances change. Developers often don't have visibility into resource costs or actual utilization.
EMP certainly has the potential to save organizations a lot of money, and that's a good thing! However, they could save even more money by right-sizing workloads and having reliable auto-scaling. I think Platform9 is a good stop-gap measure and could provide valuable information to close the feedback loop with developers.
In the long term, you should endeavor to fix the root issue and change the behavior of your application teams. It's tempting to try and "fix" a problem with a tool. People are difficult and apps are easy. Just don't forget that failing to address the root cause has a significant cost as well, one that is unlikely to shrink as your Kubernetes usage grows.
#cfd19
Chief Technology Advisor - The Futurum Group
9 个月One of the more passionate topics of the week. Who doesn’t want to save on cloud compute? The challenge is can you do it without challenging the developer to do the extra work to right size resources? Fascinating topic.
Ned Bellavance It was good to see you at Cloud Field Day. Your point about addressing the root cause and changing the behavior of application teams is well-taken. This is a cultural, process, and behavioral change and it takes a long time. In some companies, it may never get resolved. At Platform9, we see EMP as part of a larger strategy that includes educating teams and evolving practices over time. The feedback we gather from EMP’s usage will be instrumental in developing more refined and automated solutions that takes some of the burden away from developers. Thanks for your feedback.