Clusterf*ck with the Cluster (Part 1): Why Simply Moving to the Cloud Won’t Save You Money

Clusterf*ck with the Cluster (Part 1): Why Simply Moving to the Cloud Won’t Save You Money

So, you’ve decided to lift and shift your on-prem cluster to the cloud – you need the scale and all these global services discussed in the previous post, and you feel like your footprint is big enough to benefit from the cloud’s economy of scale finally. Now, you are expecting cost savings and operational bliss. Spoiler alert: That’s not how it works. If anything, your cloud bill will come as a shock.

1. Cloud DTUs Are More Expensive Than On-Prem DTUs

Unless you are so huge you got bulk discount (or were lured in by a huge cloud entry subsidy), your DTUs are more expensive. This is not a wildly popular topic, but if you look at what you pay for the provisioned DTUs cost and try to compare it to the machine on rack with the same amount of compute/storage/mem, the price of the server capable of delivering the same DTUs will be pleasantly lower.

Note: Now, a word of caution – I’m not saying that clouds are evil. I’m telling you that to realize the benefits of the cloud – besides close to zero cost of starting up your product/service – you have to design for cloud-native, enabling cost savings and entirely different build velocity. You want to operate onprem-like and still get some cost savings - you may want to consider something like Equinix, which is somewhat the same philosophy but different means to an end.

2. Hidden DTUs

?“It’s just a couple of deployments with a few pods each—bound to be cheap, right?” Wrong. When people move their clusters to the cloud, they bring along everything: telemetry, CI/CD, security guardrails, scanners, orchestration, etc. After all, you’ve invested so much in these that you don’t want to let them go – want you? But setting them up in the cloud isn’t free—agents, exporters, batch jobs, and all the integrations come at a cost. Running your own pipelines, replicating security policies, isolating workloads—each additional component means extra nodes, deployments, and pipeline runs. Every one of these is an enterprise app on its own, with an enterprise price tag to operate.

3. Extra effort

First, you need to prepare and move a lot of stuff. Then you have to reconfigure stuff. Then, you need to replace cloud-incompatible stuff with cloud-compatible stuff. Then all these things must be integrated into the infra pipelines (you don’t want snowflake clusters, do you?) After all that, it’s not uncommon to see that cluster deployment takes 8 hours to complete (I kid you not, and that’s not the longest I’ve seen). What’s the probability it will be re-paved every now and then?.. Guess what will happen when you finally decide it’s time to re-pave and how much downtime will cost you?.. And boom – now you have a “cloud ops” team and a bunch of new service requests and incident workflows [with a price tag attached], not a promised “DevOps organization.”

4. On-prem habits disable cost-saving tools

Of course, you had some kind of FinOps when you were on-prem. Most likely, every product/service/team was allocated % of the cluster, right? You spin up an environment to test something and leave it running for a week because redeploying takes effort – but what’s fine, we are under our allocated percentage – if we go even lower, no bonus is attached, only extra-effort. You keep static workloads alive 24/7 because "we’ve got reserved capacity anyway." This is how all benefits of the elasticity evaporate. The only kind of “elastic” I see in this design - ?go up on the consumption spikes; never “shut down when not needed.”

5. Operations anti-patterns

If you need to submit a request on anything but a subscription/resource group in the cloud, you are operating your cloud like another datacenter. Remember – the biggest wins come not from “$ per DTU“ on the cloud but from “$ to deliver one value point”. ?All these fantastic CLIs, SDKs, and self-service portals were introduced for one simple reason – to eliminate the need for form-filling and ops involvement (it even sounds offensive, is it not?) Now, it does not mean ops are not needed anymore – it means ops are now architects enabling core services.

6. “Extinguishing” fire by pouring gas into it

Moving to the cloud doesn’t magically remove process friction. Security and ops teams demand extra precautions because your workloads are no longer hidden behind a Big Company firewall. Reviews take longer, approvals pile up, and deployment cycles slow down. What’s the solution? You guessed right: let’s pour some gasoline in the fire - everyone gets their cluster. They allocate clusters to products, departments, or cost centers and turn on autoscaling hoping it will help (but it does not, as it mainly works up - to address spikes and a growing number of workloads). They throw in more semi-static, hours-to-build, hundreds-manhours-to-operate clusters – and the cost skyrockets.


That’s how companies run semi-static clusters (or even worse – farms of these) and plan cloud exit. The result? Double-digit thousands in TCO—without counting the hidden cost of constrained throughput and slowed-down processes. Everyone is disappointed – Cloud was just another hype, after all. So, What Can You Do to Avoid This? The answer is counterintuitive. You abandon your common sense and throw out of the window all the principles that were your “north star” until now – including being cloud-agnostic. But this is one I still need to write - "Part 2 - how to save money in the cloud?"

This is a critical topic, as many organizations are navigating the complexities of cloud management. What specific strategies do you recommend to avoid common pitfalls in cloud spending while maximizing value?

回复
Nataliia Moroz

Product Manager - EPAM Systems

1 个月

Hey Denis, great article! You've really hit the nail on the head - lift and shift just isn't the way to go when moving to the cloud. Totally agree! ?? Thanks for the wake-up call; it's a real eye-opener for those who might be underestimating how complex cloud transitions can be. I totally get where you're coming from - rethinking how we manage and use these services is key, not just doing a simple tech swap. Can't wait for Part 2! I've got some FinOps insights to share too, which could spark a lively discussion. ???? Looking forward to more!

回复
Dmitry Shitov

Head of financial systems development

1 个月

Tends to be like that

  • 该图片无替代文字

Fantastic! It reads like a whodunnit and ends with a cliffhanger. I look forward to the next part. Thank you.

要查看或添加评论,请登录

Denis Petelin的更多文章

  • Is cloud repatriation real? Shall we repatriate?

    Is cloud repatriation real? Shall we repatriate?

    IDC, Gartner, RSA, and others say it is very much so. Some put "spiraling cost" at the beginning of why companies get…

    8 条评论

社区洞察

其他会员也浏览了