Is Platform9's cost optimization worth its complexity?
Joep Piscaer
DevRel Leader building credible developer-focused community participation | socio-technical CTO | Platform Engineering aficionado | Industry Analyst | Devopsdays Organizer
Today, at Cloud Field Day 19, Platform9 showcased their cost optimization product, EMP, or Elastic Machine Pool.
In a nutshell, this tool optimizes cloud spend. Unlike some other cost optimization tools, Platform9's solution can actively optimize for cost without any pod downtime, by running AWS bare metal instances with Platform9's (KubeVirt-based) virtualization layer running 'Elastic VMs', which are spec'ed as EC2 instances. Through resources oversubscription of EVMs to the bare metal instance, EMP can start to optimize cost. Additionally, the virtualization layer allows for live migration of pods, so they don't need to be restarted for optimization to happen.
After a great off-camera discussion, a couple of things pop up in my brain.
Is the complexity worth the cost savings?
One of the mantras I live by is don't solve organizational problems with technical solutions. Cost management is a non-technical problem, although the current (but late stage) OpEx and subscription craze is starting to come to an end, and my first urge is to resist any technical solution.
Instead, I am very much in the camp of taking the harder, longer-term road of educating users of infrastructure, including developers, to do the right thing (right). However, if, and it's a big if, you take the quicker technical solution road, I'm a fan of solving the problem end-to-end. In the case of cost management, that includes implementing the proposed cost savings through automation and technical means. The jury is still out on how drastic the technical measures (implementing bare betal EC2 nodes, KubeVirt, a scheduler, etc) are allowed to be to solve the cost optimization problem, through. EMP as a product adds many customized layers of complexity that are potentially hard to dismantle in the future, leading to technical debt and snowflake deployments.
What I'm saying though here is that buying a solution wouldn't be my first choice at all, since the solution doesn't correct the underlying problem of balancing an application's performance with the cost associated with over-provisioning resources.
领英推荐
But if you do, EMP's end-to-end looks like a great way of going about it, especially given their target customer runs Kubernetes, but isn't necessarily an expert in Kubernetes. As the product stands now, it's already well-placed to save lots of money for those that have resource over-allocation issues, and it does it in a technically sound way, with tangible advantages coming from the technical implementation of bare metal servers, custom virtualization and scheduler.
Should a non-functional solution be in-band at all?
The thing that worries me a little more though, is the fact that it's an in-band solution, replacing how my workloads run. That makes the technical sell much harder, since I now have to vet the technology since my (critical) workloads now run on it. In Dutch we have this saying: Trust comes by foot but leaves by horse, meaning that building trust takes time, but it's easy to lose my trust quickly. I think that applies to this solution, as the technical implementation is pretty drastic with a potentially large impact on security, manageability, etc.
I don't like that replaces a bunch of native AWS (and/or vanilla Kubernetes parts) for the sole purpose of cost optimization. What's more, the benefits of the solution being in-band are substantial, but not critical for me to get cost savings. For instance, I wouldn't mind the optimized pod configuration to become active at the first lifecycle event due to an application-level version change, negating the need for the bare metal servers, virtualization layer and scheduler in the first place.
A small, but adjacent worry is that I need to actively need to move my applications to these instances in order for me to start optimizing cost; meaning a migration from my existing node pool of Kubernetes worker nodes to the Elastic Machine Pool, requiring a migration of some sort.
Wrapping Up
The technologist in me likes the technical implementation details of how they accomplish cost optimization. The organizational consultant in me doesn't like it, and really prefers that we solve root cause of problems instead of applying band-aids (especially if the band-aid is technical and the root cause is organizational.
Long story short: I wouldn't buy a cost optimization solution. But if I did, Platform9 Elastic Machine Pool would be high on my list.